Home:ALL Converter>create parquet files in java

create parquet files in java

Ask Time:2016-09-27T23:38:21         Author:Imbar M.

Json Formatter

Is there a way to create parquet files from java?

I have data in memory (java classes) and I want to write it into a parquet file, to later read it from apache-drill.

Is there an simple way to do this, like inserting data into a sql table?

GOT IT

Thanks for the help.

Combining the answers and this link, I was able to create a parquet file and read it back with drill.

Author:Imbar M.,eproduced under the CC 4.0 BY-SA copyright license with a link to the original source and this disclaimer.
Link to original article:https://stackoverflow.com/questions/39728854/create-parquet-files-in-java
MaxNevermind :

ParquetWriter's constructors are deprecated(1.8.1) but not ParquetWriter itself, you can still create ParquetWriter by extending abstract Builder subclass inside of it.\n\nHere an example from parquet creators themselves ExampleParquetWriter:\n\n public static class Builder extends ParquetWriter.Builder<Group, Builder> {\n private MessageType type = null;\n private Map<String, String> extraMetaData = new HashMap<String, String>();\n\n private Builder(Path file) {\n super(file);\n }\n\n public Builder withType(MessageType type) {\n this.type = type;\n return this;\n }\n\n public Builder withExtraMetaData(Map<String, String> extraMetaData) {\n this.extraMetaData = extraMetaData;\n return this;\n }\n\n @Override\n protected Builder self() {\n return this;\n }\n\n @Override\n protected WriteSupport<Group> getWriteSupport(Configuration conf) {\n return new GroupWriteSupport(type, extraMetaData);\n }\n\n }\n\n\nIf you don't want to use Group and GroupWriteSupport(bundled in Parquet but purposed just as an example of data-model implementation) you can go with Avro, Protocol Buffers, or Thrift in-memory data models. Here is an example using writing Parquet using Avro:\n\ntry (ParquetWriter<GenericData.Record> writer = AvroParquetWriter\n .<GenericData.Record>builder(fileToWrite)\n .withSchema(schema)\n .withConf(new Configuration())\n .withCompressionCodec(CompressionCodecName.SNAPPY)\n .build()) {\n for (GenericData.Record record : recordsToWrite) {\n writer.write(record);\n }\n} \n\n\nYou will need these dependencies:\n\n<dependency>\n <groupId>org.apache.parquet</groupId>\n <artifactId>parquet-avro</artifactId>\n <version>1.8.1</version>\n</dependency>\n\n<dependency>\n <groupId>org.apache.parquet</groupId>\n <artifactId>parquet-hadoop</artifactId>\n <version>1.8.1</version>\n</dependency>\n\n\nFull example here.",
2016-09-27T21:29:50
yy