In LIVE hadoop cluster How to migrate data from on prem to cloud without copying data from prem to cloud, since the data size is more than 1 petabyte. If we do it with copying then due to network bandwidth, the transfer will take few weeks.
Suppose there are 40 data nodes in cluster at Location A, and we want to move data to cloud with Data center located at Location B. Data is replicated with replication factor of 3.
My solution will be adding 5 cloud nodes in cluster every day and stopping 2 on prem nodes per day, after that running balancer. Assuming data will balanced in 1 day, then it will take atleast 20 days for entire cluster to migrate to cloud.
I am trying to figure out other way, also if anyone can correct me on my plan.
Thanks