Error viewing pyspark DataFrame

Answered

Hi, 

I have a variable that's type pyspark.sql.dataframe.Dataframe (spark 2.4.0), when I click "...View as DataFrame" in pycharm, I get a python error (below), and the python console locks up so I have to restart pycharm.

As a work-around, I was able to convert it to a pandas DataFrame (df.toPandas()), which is viewable without errors.

(pycharm pro 2018.3.4)

Traceback (most recent call last):
File "C:\Users\v-kefree\AppData\Local\JetBrains\Toolbox\apps\PyCharm-P\ch-0\183.5429.31\helpers\pydev\_pydev_comm\server.py", line 34, in handle
self.processor.process(iprot, oprot)
File "C:\Users\v-kefree\AppData\Local\JetBrains\Toolbox\apps\PyCharm-P\ch-0\183.5429.31\helpers\third_party\thriftpy\_shaded_thriftpy\thrift.py", line 266, in process
self.handle_exception(e, result)
File "C:\Users\v-kefree\AppData\Local\JetBrains\Toolbox\apps\PyCharm-P\ch-0\183.5429.31\helpers\third_party\thriftpy\_shaded_thriftpy\thrift.py", line 254, in handle_exception
raise e
File "C:\Users\v-kefree\AppData\Local\JetBrains\Toolbox\apps\PyCharm-P\ch-0\183.5429.31\helpers\third_party\thriftpy\_shaded_thriftpy\thrift.py", line 263, in process
result.success = call()
File "C:\Users\v-kefree\AppData\Local\JetBrains\Toolbox\apps\PyCharm-P\ch-0\183.5429.31\helpers\third_party\thriftpy\_shaded_thriftpy\thrift.py", line 228, in call
return f(*(args.__dict__[k] for k in api_args))
File "C:\Users\v-kefree\AppData\Local\JetBrains\Toolbox\apps\PyCharm-P\ch-0\183.5429.31\helpers\pydev\_pydev_bundle\pydev_console_utils.py", line 236, in getArray
return pydevd_thrift.table_like_struct_to_thrift_struct(array, name, roffset, coffset, rows, cols, format)
File "C:\Users\v-kefree\AppData\Local\JetBrains\Toolbox\apps\PyCharm-P\ch-0\183.5429.31\helpers\pydev\_pydevd_bundle\pydevd_thrift.py", line 588, in table_like_struct_to_thrift_struct
return TYPE_TO_THRIFT_STRUCT_CONVERTERS[type_name](array, name, roffset, coffset, rows, cols, format)
File "C:\Users\v-kefree\AppData\Local\JetBrains\Toolbox\apps\PyCharm-P\ch-0\183.5429.31\helpers\pydev\_pydevd_bundle\pydevd_thrift.py", line 488, in dataframe_to_thrift_struct
dim = len(df.axes)
File "C:\anaconda\envs\py36ml\lib\site-packages\pyspark\sql\dataframe.py", line 1300, in __getattr__
"'%s' object has no attribute '%s'" % (self.__class__.__name__, name))
AttributeError: 'DataFrame' object has no attribute 'axes'

3 comments

Hi,

Would it be possible to provide a code snippet for reproducing?

0

Sure, this does it, just click on "View as Dataframe" next to the df variable:

from pyspark.sql import SparkSession

spark = SparkSession.builder \
.appName('dataframe-hw') \
.getOrCreate()
sc = spark.sparkContext

df = spark.createDataFrame(["10", "11", "13"], "string").toDF("age")

0

Oh, I see now. Thanks for the sample code.

It is covered by the following feature request https://youtrack.jetbrains.com/issue/PY-26622, please vote for it and follow for updates. I will also report an issue about necessity to warn user about unsupported dataframes.

0

Please sign in to leave a comment.