Quantcast
Channel: Active questions tagged amazon-ec2 - Stack Overflow
Viewing all articles
Browse latest Browse all 29253

How to to call a python script to execute as part of an EC2 cluster workflow via aws cli?

$
0
0

I am calling for the creation of an ec2 instance like so:

aws2 emr create-cluster \
--name "Spark cluster with step" \
--release-label emr-5.24.1 \
--applications Name=Spark \
--log-uri s3://boris-log-bucket/logs/ \
--ec2-attributes KeyName=boris-aws,EmrManagedMasterSecurityGroup=sg-...,EmrManagedSlaveSecurityGroup=sg-... \
--instance-type m5.xlarge \
--instance-count 1 \
--bootstrap-actions Path=s3://boris-set-up/bootstrap_file2.sh \
--steps Type=Spark,Name="Spark job",ActionOnFailure=CONTINUE,Args=[--deploy-mode,cluster,--master,yarn] \
--use-default-roles \
--no-auto-terminate

The contents of bootstrap_file2.sh include:

aws s3 sync s3://boris-scripts/ /home/hadoop/

Which will copy a python script into the ec2 instance.

Now I want to have something along the lines of "Once bootstrap finishes, run the python script"

I've tried two things: 1) I added the line sudo python3 my_script.py into the bootstrap bash script and it fails 2) I modified the steps to be --steps Type=Spark,Name="Spark job",ActionOnFailure=CONTINUE,Args=[--deploy-mode,cluster,--master,yarn,s3://boris-scripts/my_script.py] and that also fails

My script my_script.py tries to grab data from Wikipedia and write it to /home/hadoop/ as a csv file


Viewing all articles
Browse latest Browse all 29253

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>