Home:ALL Converter>Copy files from S3 to HDFS using distcp or s3distcp

Copy files from S3 to HDFS using distcp or s3distcp

Ask Time:2014-03-27T13:06:52         Author:scalauser

Json Formatter

I am trying to copy files from S3 to HDFS using the following command:

hadoop distcp s3n://bucketname/filename hdfs://namenodeip/directory

However this is not working, getting an error as following:

ERROR tools.DistCp: Exception encountered 
java.lang.IllegalArgumentException: Invalid hostname in URI

I have tried to add the S3 keys in hadoop conf.xml, and it is also not working. Please help me the appropriate step by step procedure to achieve the file copy from S3 to HDFS.

Thanks in advance.

Author:scalauser,eproduced under the CC 4.0 BY-SA copyright license with a link to the original source and this disclaimer.
Link to original article:https://stackoverflow.com/questions/22678748/copy-files-from-s3-to-hdfs-using-distcp-or-s3distcp
scalauser :

The command should be like this :\n\nHadoop distcp s3n://bucketname/directoryname/test.csv /user/myuser/mydirectory/\n\n\nThis will copy test.csv file from S3 to a HDFS directory called /mydirectory in the specified HDFS path.\nIn this S3 file system is being used in a native mode. More details can be found on http://wiki.apache.org/hadoop/AmazonS3",
2014-04-25T10:23:05
yy