Quantcast
Channel: Active questions tagged amazon-ec2 - Stack Overflow
Viewing all articles
Browse latest Browse all 29248

Run HDFS cluster on AWS without EMR

$
0
0

I want to run an HDFS cluster on AWS where I can store the data that needs to be processed using my custom application running on EC2 instances. AWS EMR is the only way I could find to create an HDFS cluster on AWS. There are tutorials available on the web to create HDFS cluster using EC2 instances. But, if I use EC2 instances, I run the risk of losing the data when I shut down the instances.

What I need is:
1. An HDFS cluster that can be shut down when not in use.
2. When shut down, data should remain persisted.

There is a solution that says I can keep my data in S3 bucket and load it everytime I start the EMR cluster. However, this is repetitive and a huge overhead specially if the data is huge.

In GCP, I used DataProc cluster which satisfied the above two criteria. Shutting down the cluster at least saved the cost of VMs and I only paid for storage when not using the HDFS cluster. I am wondering if there is some similar way in AWS.


Viewing all articles
Browse latest Browse all 29248

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>