Im trying to copy data from S3 to HDFS using distcp tool. Problem with that is, that S3 cluster uses VPC endpoint and I dont know how to properly configure distcp. I have trtied several configurations but none has worked. Currently Im using following command:
hadoop distcp
-Dfs.s3a.access.key=[KEY]
-Dfs.s3a.secret.key=[SECRET]
-Dfs.s3a.region=eu-west-1
-Dfs.s3a.bucket.[BUCKET NAME].endpoint=https://bucket.vpce-[vpce id].s3.eu-west-1.vpce.amazonaws.com
s3a://[BUCKET NAME]/[FILE]
hdfs://[DESTINATION]/[FILE]
But im getint this error:
22/03/16 09:14:39 ERROR tools.DistCp: Exception encountered org.apache.hadoop.fs.s3a.AWSBadRequestException: doesBucketExistV2 on [BUCKET NAME]: com.amazonaws.services.s3.model.AmazonS3Exception: The authorization header is malformed; the region 'vpce' is wrong; expecting 'eu-west-1'
Any ideas how Distcp should be configured with VPC endpoints?
Thanks in advance