0

I'm importing a terabyte of data into an AWS Aurora MySQL table from an EC2 instance. Because our service will be down while migrating prod, I care a lot about the import speed.

Currently I can't break 1.0Gb/s import speed, measured using iftop. The speed is suspiciously not 1.1Gb/s or even 0.990 GB/s, its very very close to 1.0Gb/s, which makes me think its some sort of artificial bandwidth limit? Any suggestions what the bottleneck might be?

  1. I'm loading the data in 150MB TSV chunks using LOAD DATA LOCAL INFILE "chunk1.tsv" statements executed with 4x - 16x parallelism from my EC2 instance.
  2. My EC2 instance is currently a m5zn.6xlarge ("50 Gbps"), but I started experiments on a c5.4xlarge. They both hit the same bandwidth limit. The RDS instance is a db.r5.4xlarge ("Up to 10 Gpbs").
  3. Running the job on my local laptop to local MySQL exceeds 2.2Gb/s, and because the chunks are quite large (I've also tried 500MB chunks), I don't think its latency. My laptop shouldn't be this much faster...
2
  • Have you used a packet capture to confirm the length and uniformity of the payload packets on the wire? That could be something not noticeable locally but it could impact performance on the network.
    – Greg Askew
    Commented Dec 5, 2023 at 13:21
  • Interesting, I haven't tried that... I need to investigate if there's something I could do about it, say if mysql is transmitting the file in short packets. I haven't tuned the Linux network stack on EC2 at all, and AFAIK I can't tune the RDS Linux params.
    – Seth
    Commented Dec 5, 2023 at 23:40

0

You must log in to answer this question.