Error viewing pyspark DataFrame

Answered

Created March 08, 2019 15:22

Hi,

I have a variable that's type pyspark.sql.dataframe.Dataframe (spark 2.4.0), when I click "...View as DataFrame" in pycharm, I get a python error (below), and the python console locks up so I have to restart pycharm.

As a work-around, I was able to convert it to a pandas DataFrame (df.toPandas()), which is viewable without errors.

(pycharm pro 2018.3.4)

Traceback (most recent call last):
File "C:\Users\v-kefree\AppData\Local\JetBrains\Toolbox\apps\PyCharm-P\ch-0\183.5429.31\helpers\pydev\_pydev_comm\server.py", line 34, in handle
self.processor.process(iprot, oprot)
File "C:\Users\v-kefree\AppData\Local\JetBrains\Toolbox\apps\PyCharm-P\ch-0\183.5429.31\helpers\third_party\thriftpy\_shaded_thriftpy\thrift.py", line 266, in process
self.handle_exception(e, result)
File "C:\Users\v-kefree\AppData\Local\JetBrains\Toolbox\apps\PyCharm-P\ch-0\183.5429.31\helpers\third_party\thriftpy\_shaded_thriftpy\thrift.py", line 254, in handle_exception
raise e
File "C:\Users\v-kefree\AppData\Local\JetBrains\Toolbox\apps\PyCharm-P\ch-0\183.5429.31\helpers\third_party\thriftpy\_shaded_thriftpy\thrift.py", line 263, in process
result.success = call()
File "C:\Users\v-kefree\AppData\Local\JetBrains\Toolbox\apps\PyCharm-P\ch-0\183.5429.31\helpers\third_party\thriftpy\_shaded_thriftpy\thrift.py", line 228, in call
return f(*(args.__dict__[k] for k in api_args))
File "C:\Users\v-kefree\AppData\Local\JetBrains\Toolbox\apps\PyCharm-P\ch-0\183.5429.31\helpers\pydev\_pydev_bundle\pydev_console_utils.py", line 236, in getArray
return pydevd_thrift.table_like_struct_to_thrift_struct(array, name, roffset, coffset, rows, cols, format)
File "C:\Users\v-kefree\AppData\Local\JetBrains\Toolbox\apps\PyCharm-P\ch-0\183.5429.31\helpers\pydev\_pydevd_bundle\pydevd_thrift.py", line 588, in table_like_struct_to_thrift_struct
return TYPE_TO_THRIFT_STRUCT_CONVERTERS[type_name](array, name, roffset, coffset, rows, cols, format)
File "C:\Users\v-kefree\AppData\Local\JetBrains\Toolbox\apps\PyCharm-P\ch-0\183.5429.31\helpers\pydev\_pydevd_bundle\pydevd_thrift.py", line 488, in dataframe_to_thrift_struct
dim = len(df.axes)
File "C:\anaconda\envs\py36ml\lib\site-packages\pyspark\sql\dataframe.py", line 1300, in __getattr__
"'%s' object has no attribute '%s'" % (self.__class__.__name__, name))
AttributeError: 'DataFrame' object has no attribute 'axes'

4 comments

Sergey Karpov

Created March 11, 2019 09:06

Hi,

Would it be possible to provide a code snippet for reproducing?

8forty

Created March 11, 2019 14:24

Sure, this does it, just click on "View as Dataframe" next to the df variable:

from pyspark.sql import SparkSession

spark = SparkSession.builder \
.appName('dataframe-hw') \
.getOrCreate()
sc = spark.sparkContext

df = spark.createDataFrame(["10", "11", "13"], "string").toDF("age")

Sergey Karpov

Created March 11, 2019 15:03

Oh, I see now. Thanks for the sample code.

It is covered by the following feature request https://youtrack.jetbrains.com/issue/PY-26622, please vote for it and follow for updates. I will also report an issue about necessity to warn user about unsupported dataframes.

Permanently deleted user

Created July 30, 2019 14:31

I think this is more than just supporting the view in the scientific table viewer: this actually causes errors to pop up in a bunch of places when working in a pandas/pyspark environment...

e.g. even trying to print out some info about the spark DF in the `evaluate expression` box fails with the error `AttributeError: 'DataFrame' object has no attribute 'shape'`

It appears that it is trying to treat PySpark DataFrames as Pandas DataFrames, despite their different API...

This is potentially also related to this question I submitted on StackOverflow:https://stackoverflow.com/questions/57074149/distinguish-pyspark-and-pandas-dataframes-in-python-type-hints-pycharm

Please sign in to leave a comment.