Spark-submit arguments when sending spark job to EMR cluster in Pycharm

 

I need to provide a copy of a zipped conda environment to the executors such that they would have the right packages for running the spark job.

In the terminal the submit line could look like:

spark-submit \
--num-executors $EXECUTORS \
--master yarn \
--deploy-mode client \
--conf spark.yarn.appMasterEnv.PYSPARK_PYTHON=./environment/bin/python \
--archives /tmp/environment.tar.gz \
/myscript.py

In Pycharm, where (and how) can I enter the information about --archives when running myscript.py?

 

0
2 comments

Do you run myscript.py using Python run configuration in PyCharm?
It allows passing the parameters to the script:

Or have I completely misunderstood the case?

0
Avatar
Permanently deleted user

Yes that answers the question partly.

However I've found a solution.

 

in the spark case I can set PYSPARK_SUBMIT_ARGS =

--archives /tmp/environment.tar.gz pyspark-shell

 

and it works.

 

cheers,

Benedict

0

Please sign in to leave a comment.