Debugging Spark Submit PySpark

This is a copy of request #3248013 that hasn't received any reply from support since April 15, so I decided to look for help here.

I'd like to locally debug PySpark application that I execute using spark submit.

I followed the following documentation:
https://www.jetbrains.com/help/pycharm/big-data-tools-spark-submit.html
to successfully launch the application on a remote server.
However, if I set break points and try to debug application, it doesn't stop at break points.
Is it normal? In a sense document says pycharm allows to run spark submit not debug spark submit?

Assuming it was normal (i.e. that one can't use pycharm to debug spark submit job), I followed the following write up:
https://hyukjin-spark.readthedocs.io/en/stable/development/debugging.html
which suggests to:
1. setup pycharm in debug server mode (IDE host name localhost portX)
2. run it
3. forward localhost:portX to remotehost:portX (as I have only ssh port 22 open on remote and local hosts)
4. remotely spark submit a job (inserting in advance into code pydevd_pycharm.settrace(localhost, portX))
5. debug application.

However, I encounter the following problem once I do 1, 2, 3, 4:

Pycharm running on local host shows:
Starting debug server at port portX
Use the following code to connect to the debugger:
```import pydevd_pycharm
pydevd_pycharm.settrace('localhost', port=8887, stdoutToServer=True, stderrToServer=True)
Waiting for process connection...```

Remote host shows:
```Listening for transport dt_socket at address: 8887```

And there are no active controls to start debugging of the code.

Please help me with learning how to set up pycharm to debug an application being run using spark-submit on a remote host.

P.S.
I used telnet to make sure that port forwarding is working alright. Also /etc/hosts has mapping localhost to 127.0.0.1
P.P.S.
When I was doing step 3 before step 2, I was getting an error: Failed to find free socket port.

0

Please sign in to leave a comment.