UTF-8 Encoding problems on Os X

Permanently deleted user

Created July 03, 2009 12:36

Hi,

I'm trying to work on a full UTF-8 environement.
But I'm still facing some problem that I really don't understand.

Here's a test method exemple, which fails because of encoding:

@Test(groups = {"jpa", "common"})
    public void testGetInstanceByProperty() {
        ExpenseType et = commonDaoJpa.getInstanceByProperty(ExpenseType.class, "expenseTypeCode", 1);
        assertEquals("Pédagogie Obligatoire", et.getExpenseTypeName());
    }

The probleme comes from the é char.

The result of the test gives:

java.lang.AssertionError:
Expected :P√©dagogie Obligatoire
Actual édagogie Obligatoire

The file - the whole project actually - is UTF-8 encoded, the test runner has the -Dfile.encoding=UTF-8 option.
It seems to me that everything is UTF-8. But the harcoded "Pédagogie Obligatoire" is somehow still in MacRoman.
If I replace the expected value during debug and just input it, it works.

What is actually reading the UTF-8 encoded class with MacRoman encoding???

thanks for helping!

5 comments

Alexander Chernikov

Created July 03, 2009 16:07

Hi.

You can try the following. Go to Settings / File encodings. There set Project encoding to UTF-8 (if your files are actually UTF-8). By default that encoding is "System default" which is MacRoman on Mac. Then rebuild (important) and re-run. Some similar problems can be avoided this way.

Alexander.

Permanently deleted user

Created July 04, 2009 18:31

Alexander, the project Encoding is of course UTF-8.
Files are interpreted as UTF-8 by IDEA, there' don't seem to be any problem on this side.

But still, the problem is there.

Permanently deleted user

Created July 08, 2009 12:17

I found out what the problem was:
it's at compilation time, javac needs to be told to use UTF-8 with -the -encoding option.

Permanently deleted user

Created September 17, 2009 10:42

I originally had some problem to produce an application that would be full UTF-8.
The main mistake I made was that I didn't tell the compiler - javac - to use UTF-8 to read the .java source file. if you don't, any hardcoded string in your classes might not be rendered or used correctly.

But even now that I use javac with UTF-8 encoding, I set the JVM option file.encoding=UTF-8 for my tomcat application, Idea open the project in UTF-8 only, no exception, I still get messages like these with some test code:

log.info(System.getProperty("file.encoding") + ": é à ê");

produces: UTF-8: √© √† √™

which indicate that somewhere another encoding has been used.

I'd be really happy if someone had a clue about where this can possibly happen!

.nodje

Permanently deleted user

Created September 17, 2009 12:18

That may be related to just the process output, that is an OS level thing rather than your program itself : I'm not sure at all that the standard console is able to display UTF-8 ...

Please sign in to leave a comment.