Error viewing pyspark DataFrame



I have a variable that's type pyspark.sql.dataframe.Dataframe (spark 2.4.0), when I click "...View as DataFrame" in pycharm, I get a python error (below), and the python console locks up so I have to restart pycharm.

As a work-around, I was able to convert it to a pandas DataFrame (df.toPandas()), which is viewable without errors.

(pycharm pro 2018.3.4)

Traceback (most recent call last):
File "C:\Users\v-kefree\AppData\Local\JetBrains\Toolbox\apps\PyCharm-P\ch-0\183.5429.31\helpers\pydev\_pydev_comm\", line 34, in handle
self.processor.process(iprot, oprot)
File "C:\Users\v-kefree\AppData\Local\JetBrains\Toolbox\apps\PyCharm-P\ch-0\183.5429.31\helpers\third_party\thriftpy\_shaded_thriftpy\", line 266, in process
self.handle_exception(e, result)
File "C:\Users\v-kefree\AppData\Local\JetBrains\Toolbox\apps\PyCharm-P\ch-0\183.5429.31\helpers\third_party\thriftpy\_shaded_thriftpy\", line 254, in handle_exception
raise e
File "C:\Users\v-kefree\AppData\Local\JetBrains\Toolbox\apps\PyCharm-P\ch-0\183.5429.31\helpers\third_party\thriftpy\_shaded_thriftpy\", line 263, in process
result.success = call()
File "C:\Users\v-kefree\AppData\Local\JetBrains\Toolbox\apps\PyCharm-P\ch-0\183.5429.31\helpers\third_party\thriftpy\_shaded_thriftpy\", line 228, in call
return f(*(args.__dict__[k] for k in api_args))
File "C:\Users\v-kefree\AppData\Local\JetBrains\Toolbox\apps\PyCharm-P\ch-0\183.5429.31\helpers\pydev\_pydev_bundle\", line 236, in getArray
return pydevd_thrift.table_like_struct_to_thrift_struct(array, name, roffset, coffset, rows, cols, format)
File "C:\Users\v-kefree\AppData\Local\JetBrains\Toolbox\apps\PyCharm-P\ch-0\183.5429.31\helpers\pydev\_pydevd_bundle\", line 588, in table_like_struct_to_thrift_struct
return TYPE_TO_THRIFT_STRUCT_CONVERTERS[type_name](array, name, roffset, coffset, rows, cols, format)
File "C:\Users\v-kefree\AppData\Local\JetBrains\Toolbox\apps\PyCharm-P\ch-0\183.5429.31\helpers\pydev\_pydevd_bundle\", line 488, in dataframe_to_thrift_struct
dim = len(df.axes)
File "C:\anaconda\envs\py36ml\lib\site-packages\pyspark\sql\", line 1300, in __getattr__
"'%s' object has no attribute '%s'" % (self.__class__.__name__, name))
AttributeError: 'DataFrame' object has no attribute 'axes'

Comment actions Permalink


Would it be possible to provide a code snippet for reproducing?

Comment actions Permalink

Sure, this does it, just click on "View as Dataframe" next to the df variable:

from pyspark.sql import SparkSession

spark = SparkSession.builder \
.appName('dataframe-hw') \
sc = spark.sparkContext

df = spark.createDataFrame(["10", "11", "13"], "string").toDF("age")

Comment actions Permalink

Oh, I see now. Thanks for the sample code.

It is covered by the following feature request, please vote for it and follow for updates. I will also report an issue about necessity to warn user about unsupported dataframes.

Comment actions Permalink

I think this is more than just supporting the view in the scientific table viewer: this actually causes errors to pop up in a bunch of places when working in a pandas/pyspark environment...

e.g. even trying to print out some info about the spark DF in the `evaluate expression` box fails with the error `AttributeError: 'DataFrame' object has no attribute 'shape'`


It appears that it is trying to treat PySpark DataFrames as Pandas DataFrames, despite their different API...


This is potentially also related to this question I submitted on StackOverflow:


Please sign in to leave a comment.