Unable to read pyspark XML to dataframe

I have setup the spark environment correctly. i.e. JAVA_HOME, SPARK_HOME, HADOOP_HOME and Python 3.7 are installed correctly. I also installed PyCharm with recommended options.

My packages are:

when I run below program it gives me error.

program:


from pyspark import SparkContext
from pyspark.sql import SparkSession


if __name__ == '__main__':
sc = SparkContext(master="local", appName="Spark Demo")
spark = SparkSession(sc)

df = spark.read.format("com.databricks.spark.xml").option("rowTag", "Monitor")\
.load(f"D:\\DevOps\\Test_Data\\usage_636748337236035587.xml")

df.printSchema()

Error:

py4j.protocol.Py4JJavaError: An error occurred while calling o26.load.
: java.lang.NoSuchMethodError: org.apache.spark.sql.types.DecimalType$.Unlimited()Lorg/apache/spark/sql/types/DecimalType;
at com.databricks.spark.xml.util.InferSchema$.<init>(InferSchema.scala:36)
at com.databricks.spark.xml.util.InferSchema$.<clinit>(InferSchema.scala) .............

It would be helpful if I get to know how to install '*.jar' file packages in pycharm?

3 comments
Comment actions Permalink

Have you tried running it outside of PyCharm using the same interpreter to check if the same error appears?

1
Comment actions Permalink

No. I have just started with python programming. which IDE would you recommend outside of PyCharm? I will try there

0
Comment actions Permalink

Just try running in the command line. You can copy the command that PyCharm runs (the first line in the Run window), paste it to CMD, and hit Enter. What's the result?

 
0

Please sign in to leave a comment.