UTF-8 Encoding problems on Os X


I'm trying to work on a full UTF-8 environement.
But I'm still facing some problem that I really don't understand.

Here's a test method exemple, which fails because of encoding:

@Test(groups = {"jpa", "common"})
    public void testGetInstanceByProperty() {
        ExpenseType et = commonDaoJpa.getInstanceByProperty(ExpenseType.class, "expenseTypeCode", 1);
        assertEquals("Pédagogie Obligatoire", et.getExpenseTypeName());


The probleme comes from the é char.

The result of the test gives:

Expected :Pédagogie Obligatoire
Actual   édagogie Obligatoire


The file - the whole project actually - is UTF-8 encoded, the test runner has the -Dfile.encoding=UTF-8 option.
It seems to me that everything is UTF-8. But the harcoded "Pédagogie Obligatoire" is somehow still in MacRoman.
If I replace the expected value during debug and just input it, it works.

What is actually reading the UTF-8 encoded class with MacRoman encoding???

thanks for helping!

Comment actions Permalink


You can try the following. Go to Settings / File encodings. There set Project encoding to UTF-8 (if your files are actually UTF-8). By default that encoding is "System default" which is MacRoman on Mac. Then rebuild (important) and re-run. Some similar problems can be avoided this way.


Comment actions Permalink

Alexander, the project Encoding is of course UTF-8.
Files are interpreted as UTF-8 by IDEA, there' don't seem to be any problem on this side.

But still, the problem is there.

Comment actions Permalink

I found out what the problem was:
it's at compilation time, javac needs to be told to use UTF-8 with -the -encoding option.

Comment actions Permalink

I originally had some problem to produce an application that would be full UTF-8.
The main mistake I made was that I didn't tell the compiler - javac - to use UTF-8 to read the .java source file. if you don't, any hardcoded string in your classes might not be rendered or used correctly.

But even now that I use javac with UTF-8 encoding, I set the JVM option file.encoding=UTF-8 for my tomcat application, Idea open the project in UTF-8 only, no exception, I still get messages like these with some test code:

log.info(System.getProperty("file.encoding") + ": é à ê");

produces: UTF-8: é à ê

which indicate that somewhere another encoding has been used.

I'd be really happy if someone had a clue about where this can possibly happen!


Comment actions Permalink

That may be related to just the process output, that is an OS level thing rather than your program itself : I'm not sure at all that the standard console is able to display UTF-8 ...


Please sign in to leave a comment.