Hadoop distcp to S3 performance is very slow

Ask Time：2019-12-06T01:32:53 Author：Hemanth

I am trying to copy the data from HDFS to Amazon S3 using hadoop distcp. the amount of data is 227GB and the job has been running for more than 12 hours.

Is there a hard limit of 3500 write requests for a S3 bucket ? and could this be causing the slowdown? Is there a workaround for this? Or cloud the performance be increased in any other way?

Below is my command:

hadoop distcp -Dfs.s3a.aws.credentials.provider=org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider -Dfs.s3a.access.key=KEY -Dfs.s3a.secret.key=SECRET -Dfs.s3a.session.token=TOKEN -Dfs.s3a.server-side-encryption-algorithm=SSE-KMS -Dfs.s3a.server-side-encryption-key=enc-key -Dmapreduce.job.queuename=default -Ddistcp.dynamic.split.ratio=4 -Ddistcp.dynamic.recordsPerChunk=25 -Ddistcp.dynamic.max.chunks.tolerable=20000 -strategy dynamic -i -numListstatusThreads 40 -m 300 -update -delete /data/prod/hdp/brm s3a://bucket/data/prod/hdp/brm

There are a lot of small files. the average size of file is ~300KB. I had to launch the job twice, the first time it failed with a lot of mappers throwing errors like this:

Caused by: org.apache.hadoop.fs.s3a.AWSS3IOException: getFileStatus on s3a://bucket/data/prod/hdp/brm/.distcp.tmp.attempt_1574118601834_3172_m_000000_0: com.amazonaws.services.s3.model.AmazonS3Exception: Bad Request (Service: Amazon S3; Status Code: 400; Error Code: 400 Bad Request;

then I realized having more prefixes would help and launched a new job that went a couple of levels deeper (/data/prod/hdp/brm to /data/prod/hdp/brm/dataout/enabled) because /data/prod/hdp/brm/dataout/enabled had like 10 directories which I thought would increase the write requests. The job is running without any issues now, but the performance is really bad.

Any help would be appreciated. Thank you.

Author:Hemanth，eproduced under the CC 4.0 BY-SA copyright license with a link to the original source and this disclaimer.
Link to original article：https://stackoverflow.com/questions/59200453/hadoop-distcp-to-s3-performance-is-very-slow

Hadoop distcp to S3 performance is very slow

热门文章

jpg图片怎么转换成pdf，详细教程分享！

iphone怎么把图片转成电子版？试试这2个方法！

图片如何转换pdf文件？看看这三个方法！

怎么把图片转换成pdf格式，干货教程不要错过

png图片怎么转换成pdf，实用方法不要错过

图片怎么转pdf格式？三种转换方法分享给你，一分钟轻松解决

图片转pdf格式怎么弄免费？get这五个简单的方法，轻松搞定！

如何将图片转pdf格式？4种转换方法分享给你，一分钟轻松解决

如何图片转pdf免费？快学习这三种免费转换方法

怎么将图片转pdf？分享个图片转pdf在线免费

相关搜索

jpg图片怎么转换成pdf，详细教程分享

电脑图片转pdf工具怎么用

单张pdf图片转照片格式

如何将图片转成pdf文档，经验分享

这么好用的图片转pdf软件，我一定要分享

干货分享，不懂图片转pdf的朋友快快收藏起来

分享一个让你惊叹不已的图片转pdf方法

图片转pdf工具

分享一个大家都不知道的图片转pdf格式方法

好用的图片转pdf软件要和好朋友分享

Hadoop distcp to S3 performance is very slow

More about “Hadoop distcp to S3 performance is very slow” related questions

热门文章

jpg图片怎么转换成pdf，详细教程分享！

iphone怎么把图片转成电子版？试试这2个方法！

图片如何转换pdf文件？看看这三个方法！

怎么把图片转换成pdf格式，干货教程不要错过

png图片怎么转换成pdf，实用方法不要错过

图片怎么转pdf格式？三种转换方法分享给你，一分钟轻松解决

图片转pdf格式怎么弄免费？get这五个简单的方法，轻松搞定！

如何将图片转pdf格式？4种转换方法分享给你，一分钟轻松解决

如何图片转pdf免费？快学习这三种免费转换方法

怎么将图片转pdf？分享个图片转pdf在线免费

相关搜索

jpg图片怎么转换成pdf，详细教程分享

电脑图片转pdf工具怎么用

单张pdf图片转照片格式

如何将图片转成pdf文档，经验分享

这么好用的图片转pdf软件，我一定要分享

干货分享，不懂图片转pdf的朋友快快收藏起来

分享一个让你惊叹不已的图片转pdf方法

图片转pdf工具

分享一个大家都不知道的图片转pdf格式方法

好用的图片转pdf软件要和好朋友分享