4

When I run below code with two different project I get different outputs.

    String myString = "Türkçe Karakter Testi : ğüşiöçĞÜİŞÇÖĞ";
    String value = new String(myString.getBytes("UTF-8"));
    System.out.println(value);

First project is non-maven java application created in Netbeans 8.2. And it gives me following result which i expect.

"Türkçe Karakter Testi : ğüşiöçĞÜİŞÇÖĞ"

And second project is maven java application project which is created in same way with following pom.xml file:

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>
    <groupId>com.mycompany</groupId>
    <artifactId>mavenproject1</artifactId>
    <version>1.0-SNAPSHOT</version>
    <packaging>jar</packaging>
    <properties>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
        <maven.compiler.source>1.8</maven.compiler.source>
        <maven.compiler.target>1.8</maven.compiler.target>
    </properties>
</project>

This project gives me:

"Türkçe Karakter Testi : ğüşiöçÄ?ÜİÅ?ÇÖÄ?"

I checked both file with notepad++ and both of them are encoded with UTF-8

3
  • Similar Question: Java encoding with Eclipse and Maven
    – user8097737
    Commented Jan 16, 2018 at 10:28
  • @devpuh although this question had nothing to do with Maven actually.
    – Kayaman
    Commented Jan 16, 2018 at 10:32
  • Yeah, but i get different result with same code on both project. I cant figure out why. Commented Jan 16, 2018 at 10:36

2 Answers 2

5

You're missing the encoding from your new String() constructor, so it's using the default encoding of your platform which isn't UTF-8 (looks like some variant of ISO-8859-1).

If you use the following code (which doesn't make much sense, but shows the default encoding botching things), you'll see that it's printed properly everywhere.

String myString = "Türkçe Karakter Testi : ğüşiöçĞÜİŞÇÖĞ";
String value = new String(myString.getBytes("UTF-8"), "UTF-8");
System.out.println(value);

What's the lesson here? Always specify the encoding to use when dealing with byte/character conversion! This includes such methods as String.getBytes(), new String() and new InputStreamReader().

This is just one of the many ways that character encoding can bite you in the behind. It may seem like a simple problem, but it catches unsuspecting developers all the time.

0
1

I also often faced with the same problems.


Configuring Maven Character Encoding

The problem

  • Run my code in IDE (idea/eclipse). All correct. Output had correct encoding and in the console and in output files.

  • Run my app after built Maven. When I try to run my App (jar) built with help maven mvn clean install I got incorrect values in output related to incorrect encoding. In the console and in output files which were generated in my app I saw incorrect and unexpected symbols

  • Warning in your console. This warning means that you have not set any character encoding for your project/environment. Let's solve this problem. There are a couple of options you can consider.

[WARNING] File encoding has not been set, using platform encoding UTF-8, i.e. build is platform dependent!

Configuring Maven Character Encoding

1. Properties

A most popular and common way to set Maven Character Encoding is to use properties. These properties are supported by most plugins. These properties are easy to add. Just add them as a child element of the project element.

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0
                             http://maven.apache.org/xsd/maven-4.0.0.xsd">
    [...]
    <properties>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
        <project.reporting.outputEncoding>UTF-8</project.reporting.outputEncoding>
    </properties>
    [...]
</project>

2. Maven Resources Plugin

You can also specify Maven Character Encoding using the maven resources plugin.

The only drawback is that you have to include this plugin to your Maven pom.xml file.

JUST ADD THIS PLUGIN - It`s always helped me ))

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0
                             http://maven.apache.org/xsd/maven-4.0.0.xsd">
    [...]
    <build>
        <plugins>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-resources-plugin</artifactId>
                <configuration>
                    <encoding>UTF-8</encoding>
                </configuration>
            </plugin>
        </plugins>
    </build>
    [...]
</project>

3. Commandline

If you cannot alter the source code of a maven project, or you need to specify maven character encoding on a built server like Jenkins, Hudson, or Bamboo you can also add the encoding through the command line.

mvn -Dproject.build.sourceEncoding=UTF-8 -Dproject.reporting.outputEncoding=UTF-8 clean deploy

4. Maven Options

If you do a lot of small projects for personal gain you can also set this property globally in MAVEN_OPTS. The only drawback is that if you share your code base with another developer then the developer also has to add these MAVEN_OPTS. That’s why I do not recommend it.

set MAVEN_OPTS= -Dfile.encoding="UTF-8"

@See How to Configure Maven Character Encoding

Not the answer you're looking for? Browse other questions tagged or ask your own question.