Python program gets UnicodeDecodeError in IntelliJ but OK from command line

Hi Folks. I have a simple program that loads a .json file which contains a funny character. The program (attached) runs fine in Terminal but gets this error in IntelliJ:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 2: ordinal not in range(128)

The crucial code is:

>>> with open(jsonFileName) as f:
>>>    jsonData = json.load(f)

if I replace the open with:

>>> with open(jsonFileName, encoding='utf-8') as f:

Then it works in both IntelliJ and Terminal. I'm still new to Python and the IntelliJ plugin, and I don't understand why they're different. I thought sys.path might be different, but the output makes me think that's not the cause. Could someone please explain? Thanks!


Versions:

OS: Mac OS X 10.7.4 (also tested on 10.6.8)
Python 3.2.3 (v3.2.3:3d0686d90f55, Apr 10 2012, 11:25:50) - /Library/Frameworks/Python.framework/Versions/3.2/bin/python3.2
IntelliJ: 11.1.3 Ultimate



Attachment(s):
mc-0913-encoding-demo-whole-project.zip
unicode-error-demo.py.zip
encode-temp.json.zip
2 comments
Comment actions Permalink

I was unable to reproduce your problem. I tried it in IntelliJ Ultimate 11.1.3 with the Python plugin 2.9.2 and Python 3.2.2 on a Linux box.

Anyway, I actually should specify the encoding of your file when you open it in a text mode. If you don't specify the encoding, then according to the Python documentation it is system-dependent: on one system it's ASCII, on anther system it's UTF-8, who knowns. But on the same system it should be equal in your Python interpreter and in a IntelliJ run configuration (as long as they both are not interactive, for interactive consoles the default encoding may differ).

There are two more correct ways of reading a JSON object from a file. Either specify the text file encoding explicitly:

with open(filename, encoding=JSON_FILE_ENCODING) as f:
    json_object = json.read(f)


Or rely on the JSON standard that says that by default all JSON data should be encoded in UTF-8. The JSON parser from the Python standard library is aware of this fact, so it is able to parse JSON object from binary files using UTF-8 internally:

with open(filename, mode='rb') as f:
    json_object = json.read(f)
0
Comment actions Permalink

Thanks, Andrey! I'd mark this is the correct answer, except I already said it was Helpful.

0

Please sign in to leave a comment.