Apache dependency bug? org.apache.parquet.hadoop.codec.SnappyCodec was not found Error in apache library

Ask Time：2021-03-31T00:14:24 Author：9945

Currently trying to read a parquet file in Java without the use of Spark. Here's what I have so far, based on Adam Melnyk's blog post on the subject.

Code

        ParquetFileReader reader = ParquetFileReader.open(file);
        MessageType schema = reader.getFooter().getFileMetaData().getSchema();
        List<Type> fields = schema.getFields();
        PageReadStore pages;
-->     while ((pages = reader.readNextRowGroup()) != null) {
            long rows = pages.getRowCount();
            LOG.info("Number of rows: " + rows);
            MessageColumnIO columnIO = new ColumnIOFactory().getColumnIO(schema);
            RecordReader recordReader = columnIO.getRecordReader(pages, new GroupRecordConverter(schema));

            for (int i = 0; i < rows; i++) {
                SimpleGroup simpleGroup = (SimpleGroup) recordReader.read();
                simpleGroups.add(simpleGroup);
            }
        }

(note that the arrow is the line (167) that the error is thrown at in my code)

Error Message

org.apache.parquet.hadoop.BadConfigurationException: Class org.apache.parquet.hadoop.codec.SnappyCodec was not found
        at org.apache.parquet.hadoop.CodecFactory.getCodec(CodecFactory.java:243)
        at org.apache.parquet.hadoop.CodecFactory$HeapBytesDecompressor.<init>(CodecFactory.java:96)
        at org.apache.parquet.hadoop.CodecFactory.createDecompressor(CodecFactory.java:212)
        at org.apache.parquet.hadoop.CodecFactory.getDecompressor(CodecFactory.java:201)
        at org.apache.parquet.hadoop.CodecFactory.getDecompressor(CodecFactory.java:42)
        at org.apache.parquet.hadoop.ParquetFileReader$Chunk.readAllPages(ParquetFileReader.java:1519)
        at org.apache.parquet.hadoop.ParquetFileReader$Chunk.readAllPages(ParquetFileReader.java:1402)
        at org.apache.parquet.hadoop.ParquetFileReader.readChunkPages(ParquetFileReader.java:1023)
        at org.apache.parquet.hadoop.ParquetFileReader.readNextRowGroup(ParquetFileReader.java:928)
        at [myClassPath]([myClass].java:167)

Dependencies

   <groupId>org.apache.hadoop</groupId>
   <artifactId>hadoop-hdfs</artifactId>
   <version>3.1.1.3.1.4.0-315</version>
 </dependency>
 <dependency>
   <groupId>org.apache.hadoop</groupId>
   <artifactId>hadoop-common</artifactId>
   <version>3.1.1.3.1.4.0-315</version>
 </dependency>
 <dependency>
   <groupId>org.apache.spark</groupId>
   <artifactId>spark-launcher_2.12</artifactId>
   <version>3.0.0-preview2</version>
 </dependency>
 <dependency>
   <groupId>org.apache.parquet</groupId>
   <artifactId>parquet-avro</artifactId>
   <version>1.12.0</version>
 </dependency>

It seems as though the SnappyCodec class cannot be found from the CodecFactory class, but I looked into my referenced libraries and the class is there: referenced_libraries

CodecFactory should be able to recognize the SnappyCodec class. Any recommendations? Thanks

Author:9945，eproduced under the CC 4.0 BY-SA copyright license with a link to the original source and this disclaimer.
Link to original article：https://stackoverflow.com/questions/66874641/apache-dependency-bug-org-apache-parquet-hadoop-codec-snappycodec-was-not-found

Apache dependency bug? org.apache.parquet.hadoop.codec.SnappyCodec was not found Error in apache library