Unable to work in console when using large Pandas Dataframe
Answered
Whenever I create a large (~1.5x10^6 rows, with large text data column, with a total size of 5GB size, with deep=True), Pycharm's iPython console gets totally unresponsive. Even simplest of operations, like doing analytics on a small subset, i.e. working with df1 defined as
df1 = df[0:100].copy()
takes forever, and hangs the console. Ctrl-C does not work and the message REPL Communication is shown in Background Tasks.
If the same operations are sent to the iPython console, outside Pycharm, no retardation is observed.
Please sign in to leave a comment.
Hello! Could you please disable variables showing (uncheck "Show Variables" button in Python Console tool window) and try to reproduce the problem again?
The problem persists if Show Variables is unchecked.
In both situations (Show Variables checked/unchecked), a simple df.shape might take more than 30 seconds the first time it is called.
Let me go through some example (the file has ~1.5Million records, with a large text field: clean_text). All that follows is run in a iPython console inside Pycharm 2018.1.4 (python 3.5.2 and Show Variables is unchecked from the start):
If I call "df.shape" after this, it takes from no noticeable time to around 5 seconds (I don't know on which this variation depends). The results is (1493318, 11).
Then, after some processing (long but simple):
then, df.shape now takes more than 30 seconds to show the obvious result (1493318, 12). df.iloc[0], for example, takes about the same time.
This happens with ipython console inside Pycharm (with an independent iphyton console, I don't notice any delay)
Thank you for the update.
I've created https://youtrack.jetbrains.com/issue/PY-30650 in Pycharm issue tracker, please follow it for updates. See https://intellij-support.jetbrains.com/hc/en-us/articles/207241135-How-to-follow-YouTrack-issues-and-receive-notifications if you are not familiar with YouTrack. Please attach your zipped log folder (https://intellij-support.jetbrains.com/hc/en-us/articles/207241085-Locating-IDE-log-files) to the issue.
I am experiencing the same problem with sometime some very basic request like `df.columns`.
Currently using PyCharm Professional 2018.1.2 on MacOs 10.11.6
thanks
Hi Walter,
Please vote for the issue above and feel free to leave a comment.
I am working with images recognition and I have to admit the REPL Communication is causing alot of delay. Even simple commands would take a while to execute. The delay gets worse the more I load more variables and image files. Could you please make REPL Communication work faster. Thank you.
REPL Communications slows down all commands in PyCharm. Please fix this issue ASAP.
Please update to the latest version (2019.2) and try changing the variable loading policy according to https://youtrack.jetbrains.com/issue/PY-30222#focus=streamItem-27-2904652.0-0