I have ~ 2 million 2MB files in S3.
What's the best way to simultaneously stream these files to an EC2 instance programatically in python?
Will the creation of 2 million simultaneous TCP connections on a single instance be a challenge?
My understanding is that each TCP connection can take 32K memory for TCP state data structures and buffers, and it takes ~10ms to setup each connection.
So 2 million connections would take 64GB of memory.
Assuming I have an instance with enough memory, what other challenges might I face? Limit in OS file descriptors?