I am interested in doing some machine learning using an AWS EC2 instance. I have played around with launching instances with a an attached EBS and I was able to load files into it via scp
on my local command line. I will have several gigabytes of data to load onto this EBS (I know that isn't a lot by ML standards but that's not really my point). I would like to know what is the appropriate way to load this data. I'm concerned about racking up large fees because I did something in a silly way.
So far I have just uploaded a few files to the EC2 instance's associated EBS manually via the command line, like this:
scp -i keys/ec2-ml-micro2.pem data/BB000000001.png ubuntu@<my instance ip>:/data
This seems to me to be a rather primitive approach (not that that is always a bad thing). Is it the "right" way? I'm not opposed to letting a batch jbb run overnight like this but I am not sure if it may incur some data transfer fees. I've looked around for information on this, and I have read the page on EBS pricing. I didn't see anything on costs associated with loading data but I just wanted to confirm with someone or some people who have done something similar that this is the correct approach, and if not, what is a a better one