Python program gets UnicodeDecodeError in IntelliJ but OK from command line
Hi Folks. I have a simple program that loads a .json file which contains a funny character. The program (attached) runs fine in Terminal but gets this error in IntelliJ:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 2: ordinal not in range(128)
The crucial code is:
>>> with open(jsonFileName) as f:
>>> jsonData = json.load(f)
if I replace the open with:
>>> with open(jsonFileName, encoding='utf-8') as f:
Then it works in both IntelliJ and Terminal. I'm still new to Python and the IntelliJ plugin, and I don't understand why they're different. I thought sys.path might be different, but the output makes me think that's not the cause. Could someone please explain? Thanks!
Versions:
OS: Mac OS X 10.7.4 (also tested on 10.6.8)
Python 3.2.3 (v3.2.3:3d0686d90f55, Apr 10 2012, 11:25:50) - /Library/Frameworks/Python.framework/Versions/3.2/bin/python3.2
IntelliJ: 11.1.3 Ultimate
Attachment(s):
mc-0913-encoding-demo-whole-project.zip
unicode-error-demo.py.zip
encode-temp.json.zip
请先登录再写评论。
I was unable to reproduce your problem. I tried it in IntelliJ Ultimate 11.1.3 with the Python plugin 2.9.2 and Python 3.2.2 on a Linux box.
Anyway, I actually should specify the encoding of your file when you open it in a text mode. If you don't specify the encoding, then according to the Python documentation it is system-dependent: on one system it's ASCII, on anther system it's UTF-8, who knowns. But on the same system it should be equal in your Python interpreter and in a IntelliJ run configuration (as long as they both are not interactive, for interactive consoles the default encoding may differ).
There are two more correct ways of reading a JSON object from a file. Either specify the text file encoding explicitly:
Or rely on the JSON standard that says that by default all JSON data should be encoded in UTF-8. The JSON parser from the Python standard library is aware of this fact, so it is able to parse JSON object from binary files using UTF-8 internally:
Thanks, Andrey! I'd mark this is the correct answer, except I already said it was Helpful.