How to copy millions files from dedicated server to AWS EC2? [closed]

Question

Closed. This question is opinion-based. It is not currently accepting answers.

Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.

Closed 5 years ago.

Improve this question

I have a website that needs to move from a dedicated server to AWS EC2 instance. I have 650GB+ data and 3+ million files.

I tried using SCP like this but because of huge file it taking so much time.

scp -r [email protected]:/remote/directory /local/directory

My Source OS is Centos 7.5 with cPanel. 1TB HDD and 650GB data, the destination server is Ubuntu 18.04, 700GB HDD.

I know we have some other option also like LFTP, SFTP, rSync etc, please help me with quickest method.

Please Edit the question (to the bottom left of the question text) to indicate the OS of the source machine, and any other specifications like confirming exact copy. — MountainMan, Commented Feb 27, 2019 at 16:31
If you're willing to spend money, there are commercial file-transfer solutions which are much faster than scp, rsync, or sftp. — Kenster, Commented Feb 27, 2019 at 20:55
@Kenster Thank you but I already started using SCP and almost 50% completed so in this situation I don't want to spend money on transfer files. — Mi2, Commented Mar 1, 2019 at 15:58

cybernard · Accepted Answer · 2019-02-27 16:22:31Z

1

I would suggest zipping the files in say 1 GB chunks and uploading those. When unzipping each file is checked against a CRC checksum. You can use built-in splitting so zip automatically generates .z00 .z01 .z02 .z03 ...

Alternatively, you can use the rar format which allows creation of parity data to repair damaged segments.

answered Feb 27, 2019 at 16:22

cybernard

14.2k3 gold badges30 silver badges35 bronze badges

Add a comment |

Dmytro · Accepted Answer · 2019-02-27 15:38:25Z

0

There is one AWS Solution how to transfer your data:

https://aws.amazon.com/snowball/?nc1=h_ls

As I know, you'll get a device (via Post Service like DHL) You can copy your data on this device and then Amazon will upload this data for you.

answered Feb 27, 2019 at 15:38

Dmytro

101

I can't understand why I will need the device, I can copy all files via the web, I know I can do this using SCP, lftp, rsync, sftp but I want to know which one is fast and no risk for data missing. If possible then need some help with SSH command.
– Mi2
Commented Feb 27, 2019 at 15:48
@user219457 please Edit the original question with your specifications. You've found some of the right tools, and figuring out how to use those is important.
– MountainMan
Commented Feb 27, 2019 at 16:28

Add a comment |

harrymc · Accepted Answer · 2019-02-27 16:55:45Z

The only way to speed the upload is to do it in multiple parts in parallel.

If you can divide the job among several computers using distinct connections, this will speed up the upload.

If a single computer does not reach full throughput, you can opt for a multi-thread method where each thread will open its own connection in parallel.

See the post Which is the fastest way to copy 400G of files from an ec2 elastic block store volume to s3? for suggestions of products and scripts.

See also the article FS File Sync – Faster File Transfer To Amazon EFS File Systems.

kenorb · Accepted Answer · 2019-02-27 17:02:35Z

0

When using scp, it doesn't retry or continue on partially transferred files.

Try using rsync instead, e.g.

rsync -vuaz [email protected]:/remote/directory/ /local/directory/

Arguments:

-v/--verbose increase verbosity.
-u/--update skip files that are newer on the receiver.
-a/--archive archive mode; equals -rlptgoD
-z/--compress compress file data during the transfer.

answered Feb 27, 2019 at 17:02

kenorb

25.8k27 gold badges134 silver badges204 bronze badges

My maximum files are images and size is less than 2Mb, do you think rsync will do faster than SCP? I already started copying an 80GB directory, so if I close puty now and start using rsync do you this I will get an issue with already downloaded files?
– Mi2
Commented Feb 27, 2019 at 17:18
With scp, once you got transfer error, you've to transfer everything over and over again, as you don't know which files were copied fully, which not. Rsync will make the list of all files which needs to be updated, before copying anything. You can use rsync after you used scp, so it can continue from the point where scp finished. Not sure if it's faster, the speed could be the same. You can leave putty (to avoid unnecessary changes), once you got any transfer issue, continue with rsync.
– kenorb
Commented Feb 27, 2019 at 17:30

Add a comment |

kenorb · Accepted Answer · 2019-02-27 17:33:51Z

0

Try installing AWS CLI on your dedicated server.

Then use aws s3 command to transfer the files to your AWS S3 bucket first.

E.g.

aws s3 sync local/directory s3://mybucket/local/directory

Then transfer back to your local EC2 instance:

aws s3 sync s3://mybucket/local/directory local/directory

The command is designed to copy large number of files, and it can continue on failure.

You can also decide to serve the files for EC2 instance directly from S3.

Check aws s3 sync help for help.

edited Feb 27, 2019 at 17:33

answered Feb 27, 2019 at 17:08

kenorb

25.8k27 gold badges134 silver badges204 bronze badges

Add a comment |

Stack Exchange Network

How to copy millions files from dedicated server to AWS EC2? [closed]

5 Answers 5

Not the answer you're looking for? Browse other questions tagged
ssh
sftp
scp
amazon-ec2
lftp
.

Hot Network Questions

How to copy millions files from dedicated server to AWS EC2? [closed]

5 Answers 5

Not the answer you're looking for? Browse other questions tagged sshsftpscpamazon-ec2lftp.

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
ssh
sftp
scp
amazon-ec2
lftp
.