Spark-submit arguments when sending spark job to EMR cluster in Pycharm
I need to provide a copy of a zipped conda environment to the executors such that they would have the right packages for running the spark job.
In the terminal the submit line could look like:
spark-submit \
--num-executors $EXECUTORS \
--master yarn \
--deploy-mode client \
--conf spark.yarn.appMasterEnv.PYSPARK_PYTHON=./environment/bin/python \
--archives /tmp/environment.tar.gz \
/myscript.py
In Pycharm, where (and how) can I enter the information about --archives when running myscript.py?
Please sign in to leave a comment.
Do you run myscript.py using Python run configuration in PyCharm?
It allows passing the parameters to the script:
Or have I completely misunderstood the case?
Yes that answers the question partly.
However I've found a solution.
in the spark case I can set PYSPARK_SUBMIT_ARGS =
and it works.
cheers,
Benedict