I need to read parquet file from s3 using java & maven support.
public static void main(String[] args) throws IOException, URISyntaxException {
Path path = new Path("s3", "batch-dev", "/aman/part-e52b.c000.snappy.parquet");
Configuration conf = new Configuration();
conf.set("fs.s3.awsAccessKeyId", "xxx");
conf.set("fs.s3.awsSecretAccessKey", "xxxxx");
InputFile file = HadoopInputFile.fromPath(path, conf);
ParquetFileReader reader2 = ParquetFileReader.open(conf, path);
//MessageType schema = reader2.getFooter().getFileMetaData().getSchema();
//System.out.println(schema);
}
Using above code, give FileNotFoundException
Note that: Note that I am using s3 scheme and not s3a. Not sure whether we have support for s3 scheme in Hadoop.
Exception in thread "main" java.io.FileNotFoundException: s3://batch-dev/aman/part-e52b.c000.snappy.parquet: No such file or directory.
at org.apache.hadoop.fs.s3.S3FileSystem.getFileStatus(S3FileSystem.java:334)
at org.apache.parquet.hadoop.util.HadoopInputFile.fromPath(HadoopInputFile.java:39)
at com.bidgely.cloud.core.cass.gb.S3GBRawDataHandler.main(S3GBRawDataHandler.java:505)
However, with the same path if I use s3Client, I am able to get the object. But the problem here is that I can not read parquet data from input stream getting from below code.
public static void main(String args[]) {
AWSCredentials credentials = new BasicAWSCredentials("XXXXX", "XXXXX");
AmazonS3 s3Client = AmazonS3ClientBuilder.standard().withRegion("us-west-2").withCredentials(new AWSStaticCredentialsProvider(credentials)).build();
S3Object object = s3Client.getObject(new GetObjectRequest("batch-dev", "/aman/part-e52b.c000.snappy.parquet"));
System.out.println(object.getObjectContent());
}
Kindly help me with the solution. [I had to use java only].