4

I am a master's student and I am researching static analysis. In one of my tests I came across a problem in marking lines in the java compiler.

I have the following java code:

 226:   String json = "/org/elasticsearch/index/analysis/commongrams/commongrams_query_mode.json";
 227:   Settings settings = Settings.settingsBuilder()
 228:           .loadFromStream(json, getClass().getResourceAsStream(json))
 229:           .put("path.home", createHome())
 230:           .build();

When compiling this code, and executing the command javap -p -v CLASSNAME, I get a table with the corresponding line of the source code for each instruction in the bytecode.

See the image below:

Bytecode table

The problem is that in the call to the .put (" path.home ", createHome ()) method, bytecode generates basically four instructions:

19: anewarray  
24: ldc - String path.home
30: invokespecial - createHome
34: invokevirtual - put

Being the first two marked as line 228 (Wrong) and the last two as line 229 (correct).

See the image below:

Bytecode table

This is the original implementation of the .put("path.home", createHome()) method:

     public Builder put(Object... settings) {
        if (settings.length == 1) {
            // support cases where the actual type gets lost down the road...
            if (settings[0] instanceof Map) {
                //noinspection unchecked
                return put((Map) settings[0]);
            } else if (settings[0] instanceof Settings) {
                return put((Settings) settings[0]);
            }
        }
        if ((settings.length % 2) != 0) {
            throw new IllegalArgumentException("array settings of key + value order doesn't hold correct number of arguments (" + settings.length + ")");
        }
        for (int i = 0; i < settings.length; i++) {
            put(settings[i++].toString(), settings[i].toString());
        }
        return this;
    }

I have already tried to compile the code using Oracle-JDK v8 and Open-JDK v16 and in both results.

I also did a test by making a change to the put() method by removing its parameters. When compiling this code the problem in marking the lines did not occur.

I wonder why the bytecode instructions map the line 229: .put (" path.home ", createHome ()) on lines other than the original in the java source code? Does anyone know if this is done on purpose?

1 Answer 1

6

This is connected to the way, the line number association is stored in the class file and the history of the javac compiler.

The line number table only contains entries associating line numbers to a a code location marking its beginning. So all instructions after that location are assumed to belong to the same line up to the next location that has been explicitly mentioned in the table.

Since detailed information will take up space and the specification does not demand a particular precision for the line number table, compiler vendors made different decisions about which details to include.

In the past, i.e. up to Java 7, javac only generated line number table entries for the beginning of statements, so when I compile the following code with Java 7’s javac

String settings = new StringBuilder() // this is line 7 in my .java file
    .append('a')
    .append(
      5
      +
      "".length())
    .toString();

I get something like

stack=3, locals=2, args_size=1
   0: new           #2                  // class java/lang/StringBuilder
   3: dup
   4: invokespecial #3                  // Method java/lang/StringBuilder."<init>":()V
   7: bipush        97
   9: invokevirtual #4                  // Method java/lang/StringBuilder.append:(C)Ljava/lang/StringBuilder;
  12: iconst_5
  13: ldc           #5                  // String
  15: invokevirtual #6                  // Method java/lang/String.length:()I
  18: iadd
  19: invokevirtual #7                  // Method java/lang/StringBuilder.append:(I)Ljava/lang/StringBuilder;
  22: invokevirtual #8                  // Method java/lang/StringBuilder.toString:()Ljava/lang/String;
  25: astore_1
  26: return
LineNumberTable:
  line 7: 0
  line 14: 26

which would cause all instructions belonging to the statement to be associated with line 7 only.

This has been considered to be too little, so starting with Java 8, javac generates additional entries for method invocations within an expression spanning multiple lines. So when I compile the same code with Java 8 or newer, I get

stack=3, locals=2, args_size=1
   0: new           #2                  // class java/lang/StringBuilder
   3: dup
   4: invokespecial #3                  // Method java/lang/StringBuilder."<init>":()V
   7: bipush        97
   9: invokevirtual #4                  // Method java/lang/StringBuilder.append:(C)Ljava/lang/StringBuilder;
  12: iconst_5
  13: ldc           #5                  // String
  15: invokevirtual #6                  // Method java/lang/String.length:()I
  18: iadd
  19: invokevirtual #7                  // Method java/lang/StringBuilder.append:(I)Ljava/lang/StringBuilder;
  22: invokevirtual #8                  // Method java/lang/StringBuilder.toString:()Ljava/lang/String;
  25: astore_1
  26: return
LineNumberTable:
  line 7: 0
  line 8: 9
  line 12: 15
  line 9: 19
  line 13: 22
  line 14: 26

Note how each additional entry (compared to the Java 7 version) points to an invocation instruction, to ensure that the method invocations are associated with the correct line number. This greatly improves exception stack traces as well as step debugging.

The non-invocation instructions having no explicit entry will still get associated with their closest preceding code location that has an entry.

Therefore, the bipush 97 instruction corresponding to the 'a' constant gets associated with line 7 as only the subsequent append invocation consuming the constant has an explicit entry associating it with line 8.

The consequences for the next expression, 5 + "".length(), are even more dramatic.

The instructions for pusing the constants, iconst_5 and ldc [""], get associated to line 8, the location of the previous append invocation, whereas the iadd instruction, actually belonging to the + operator between the 5 and "" constants, gets associated with the line 12, as the most recent invocation instruction that got an explicit line number is the length() invocation.

For comparison, this is how Eclipse compiles the same code:

stack=3, locals=2, args_size=1
   0: new           #20                 // class java/lang/StringBuilder
   3: dup
   4: invokespecial #22                 // Method java/lang/StringBuilder."<init>":()V
   7: bipush        97
   9: invokevirtual #23                 // Method java/lang/StringBuilder.append:(C)Ljava/lang/StringBuilder;
  12: iconst_5
  13: ldc           #27                 // String
  15: invokevirtual #29                 // Method java/lang/String.length:()I
  18: iadd
  19: invokevirtual #35                 // Method java/lang/StringBuilder.append:(I)Ljava/lang/StringBuilder;
  22: invokevirtual #38                 // Method java/lang/StringBuilder.toString:()Ljava/lang/String;
  25: astore_1
  26: return
LineNumberTable:
  line 6: 0
  line 7: 7
  line 9: 12
  line 11: 13
  line 9: 18
  line 8: 19
  line 12: 22
  line 6: 25
  line 13: 26

The Eclipse compiler doesn’t have javac’s history, but rather has been designed to produce line number entries for expressions in the first place. We can see that it associates the first instruction belonging to an invocation expression (not the invocation instruction) with the right line, i.e. the bipush 97 for append('a') and ldc [""] for "".length().

Further, it has additional entries for iconst_5, iadd, and astore_1, to associate them with the right lines. Of course, this higher precision also results in slightly bigger class files.

1
  • Thank you very much for your answer, this was very enlightening and will help a lot in my research. Commented May 11, 2021 at 3:25

Not the answer you're looking for? Browse other questions tagged or ask your own question.