Home:ALL Converter>Not able to read from Parquet Reader along with Hadoop configuration using Java

Not able to read from Parquet Reader along with Hadoop configuration using Java

Ask Time:2022-06-21T18:50:38         Author:Aman Kumar Jha

Json Formatter

I need to read parquet file from s3 using java & maven support.

public static void main(String[] args) throws IOException, URISyntaxException {
        Path path = new Path("s3", "batch-dev", "/aman/part-e52b.c000.snappy.parquet");
        Configuration conf = new Configuration();
        conf.set("fs.s3.awsAccessKeyId", "xxx");
        conf.set("fs.s3.awsSecretAccessKey", "xxxxx");
        InputFile file = HadoopInputFile.fromPath(path, conf);
        ParquetFileReader reader2 = ParquetFileReader.open(conf, path);
        
        //MessageType schema = reader2.getFooter().getFileMetaData().getSchema();
        //System.out.println(schema);
}

Using above code, give FileNotFoundException

Note that: Note that I am using s3 scheme and not s3a. Not sure whether we have support for s3 scheme in Hadoop.

Exception in thread "main" java.io.FileNotFoundException: s3://batch-dev/aman/part-e52b.c000.snappy.parquet: No such file or directory.
    at org.apache.hadoop.fs.s3.S3FileSystem.getFileStatus(S3FileSystem.java:334)
    at org.apache.parquet.hadoop.util.HadoopInputFile.fromPath(HadoopInputFile.java:39)
    at com.bidgely.cloud.core.cass.gb.S3GBRawDataHandler.main(S3GBRawDataHandler.java:505)

However, with the same path if I use s3Client, I am able to get the object. But the problem here is that I can not read parquet data from input stream getting from below code.

public static void main(String args[]) {
        AWSCredentials credentials = new BasicAWSCredentials("XXXXX", "XXXXX");
        AmazonS3 s3Client = AmazonS3ClientBuilder.standard().withRegion("us-west-2").withCredentials(new AWSStaticCredentialsProvider(credentials)).build();
        S3Object object = s3Client.getObject(new GetObjectRequest("batch-dev", "/aman/part-e52b.c000.snappy.parquet"));
        System.out.println(object.getObjectContent());
}

Kindly help me with the solution. [I had to use java only].

Author:Aman Kumar Jha,eproduced under the CC 4.0 BY-SA copyright license with a link to the original source and this disclaimer.
Link to original article:https://stackoverflow.com/questions/72699479/not-able-to-read-from-parquet-reader-along-with-hadoop-configuration-using-java
yy