Taco Bell Javap

Recently, I wanted to better understand a java program’s execution. I wanted to know every usage of a particular class. For example, let’s look for calls to the static method BigInteger.valueOf. I have a funny constraint though, I don’t own the source code. My solution involves looking at java bytecode.

All About That Bytecode

We write java programs in syntax that’s supposed to be readable0. Here is Fibonacci1 with BigInteger’s.

(0) Some developers find joy in taking that human readable definition to its lower limit.

(1) My implementation starts to make the word “next” look a little funny. Supposedly this is called “semantic satiation”.

import java.util.Iterator;
import java.math.BigInteger;

class MyFibIterator implements Iterator<BigInteger> {
  private BigInteger next = BigInteger.ZERO;
  private BigInteger nextNext = BigInteger.ONE;
  @Override
  public boolean hasNext() { return true; }
  @Override
  public BigInteger next() {
    BigInteger toReturn = next;
    BigInteger newNextNext = next.add(nextNext);
    next = nextNext;
    nextNext = newNextNext;
    return toReturn;
  }
}

My computer can’t run this text outright. Instead a program that can run on my computer runs java programs. That program is the JVM. Except … even the JVM doesn’t want to look at my silly code. Instead it wants an intermediate form called java bytecode. We transform java text into bytecode with the javac “compiler”. Here’s the problem, that bytecode is opaque to plaintext tools.

% cat MyFibIterator.class
����C(

java/lang/Object<init>()V




java/math/BigIntegervalueOf(J)Ljava/math/BigInteger;
MyFibIteratornextLjava/math/BigInteger;
                                        nextNext

add.(Ljava/math/BigInteger;)Ljava/math/BigInteger;
()Ljava/math/BigInteger;java/util/IteratorCodeLineNumberTablehasNext()Z()Ljava/lang/Object;     Signature>Ljava/lang/Object;Ljava/util/Iterator<Ljava/math/BigInteger;>;
*ourceFileMyFibIterator.java 5*�*       ��
���
*,�+� �� *�

A#*�� $%&'%

These bytes follow the structure of the class file format. If I dump bytes to hexadecimal digits with xxd -p I get this.


cafebabe0000004300280a000200030700040c000500060100106a617661
2f6c616e672f4f626a6563740100063c696e69743e0100032829560a0008
000907000a0c000b000c0100146a6176612f6d6174682f426967496e7465
67657201000776616c75654f66010019284a294c6a6176612f6d6174682f
426967496e74656765723b09000e000f0700100c0011001201000d4d7946
69624974657261746f720100046e6578740100164c6a6176612f6d617468
2f426967496e74656765723b09000e00140c001500120100086e6578744e
6578740a000800170c0018001901000361646401002e284c6a6176612f6d
6174682f426967496e74656765723b294c6a6176612f6d6174682f426967
496e74656765723b0a000e001b0c0011001c01001828294c6a6176612f6d
6174682f426967496e74656765723b07001e0100126a6176612f7574696c
2f4974657261746f72010004436f646501000f4c696e654e756d62657254
61626c650100076861734e65787401000328295a01001428294c6a617661
2f6c616e672f4f626a6563743b0100095369676e617475726501003e4c6a
6176612f6c616e672f4f626a6563743b4c6a6176612f7574696c2f497465
7261746f723c4c6a6176612f6d6174682f426967496e74656765723b3e3b
01000a536f7572636546696c650100124d794669624974657261746f722e
6a6176610020000e00020001001d00020002001100120000000200150012
000000040000000500060001001f0000003500030001000000152ab70001
2a09b80007b5000d2a0ab80007b50013b10000000100200000000e000300
00000400040005000c00060001002100220001001f0000001a0001000100
00000204ac0000000100200000000600010000000800010011001c000100
1f0000004800020003000000202ab4000d4c2ab4000d2ab40013b600164d
2a2ab40013b5000d2a2cb500132bb0000000010020000000160005000000
0b0005000c0011000d0019000e001e000f1041001100230001001f000000
1d00010001000000052ab6001ab000000001002000000006000100000004
000200240000000200250026000000020027

Through magic methods I highlighted where we call the static method BigInteger.valueOf2. Most of my readers won’t have my magic ball though. We need a better way to read this file.

(2) OK. I’ll reveal my trick. The invokestatic instruction opcode is b8. We call that static method twice. There are only two b8’s in our hexdump! That made the interpretation super easy. Bonus Challenge

Javap

Javap is a class disassembler. This tool knows how to read the class file format. With the right flags we can use it to print the instructions we are sending to the JVM. I’ve cut out some lines, and added my own comments in brackets.

[The class file is printed because we used the -sysinfo flag.]
Classfile /Users/mathias.kools/Desktop/tacobelljavap/MyFibIterator.class
  [...]  
  Compiled from "MyFibIterator.java"
class MyFibIterator implements java.util.Iterator<java.math.BigInteger> {
  private java.math.BigInteger next;

  private java.math.BigInteger nextNext;

  MyFibIterator();
    Code:
       0: aload_0
          [...]       
          [Check it out our static method call!]
       6: invokestatic  #7                  // Method java/math/BigInteger.valueOf:(J)Ljava/math/BigInteger;
       9: putfield      #13                 // Field next:Ljava/math/BigInteger;
          [...]
      20: return

  public boolean hasNext();
    Code: [...]

  public java.math.BigInteger next();
    Code:
          [...]
      13: invokevirtual #22                 // Method java/math/BigInteger.add:(Ljava/math/BigInteger;)Ljava/math/BigInteger;
          [...]
      31: areturn

  [We didn’t write this next method!]
  [This is a bridge method^3!]
  public java.lang.Object next();
    Code: [...]
}
(3) Read more about bridge methods here.

Living Mas

I now have a way to search bytecode for static method usage. Let’s apply it. In my situation, I wanted to search two hundred thousand class files3. To recreate this kind of scale, I’ll download thirty popular JARs.

(3) At ChaosSearch we were researching class behavior in a Databricks runtime environment which brings a whole bunch of libraries in.
% cd /Users/mathias.kools/Desktop/tacobelljavap/jars/
% wget https://repo1.maven.org/maven2/junit/junit/4.13.2/junit-4.13.2.jar
[...]
% wget https://repo1.maven.org/maven2/com/google/code/findbugs/jsr305/3.0.2/jsr305-3.0.2.jar
% for jar in *.jar; do
  dir="${jar%.*}"
  mkdir -p "$dir"
  unzip -q "$jar" -d "$dir"
done
% cd ..; find jars -type f -name '*.class' | wc -l
19080

Shucks. I’m off by an order of magnitude. That’s okay, this is enough files to illustrate the challenges I faced. To look for this method, I want to resist the urge to go web scale. Instead I want to try to combine simple command line tools. This is taco bell programming.

I’m a terrible cook. There’s only a few “simple”4 ingredients here. But I’m still going to let an LLM ratatouille me through the command line arguments. That means the final result is going to be a bit of a taco bell sauce tote bag.

(4) Just like “tar” I can never remember how to use “find”. I’m not alone.

Attempt One, Just Xargs

We’ll run javap on every file. How bad could this possibly be?

% find jars -type f -name '*.class' | xargs -n1 javap -c -sysinfo

Pretty bad. Using later data, I estimate it would finish in one hour and 42 minutes. If I wanted to be efficient with work hours I’d shut up and eat my garbage. Instead, in after hours I’ve nibbled at this for something like 3 months5.

(5) And I hope that doesn’t make me seem like a bad software engineer! Writing about it took up almost all that time. I’m a bad writer.

Attempt Two, A Little Gusteau

To measure rate of improvement, I’m going to test on a simple random sample.

One wrinkle to any measurement we’re doing is that there are some HUGE class files (histogram in kilobytes).

% find jars -type f -name '*.class' | xargs -n1 -I % du -k % | awk 'BEGIN { binwidth=25; max=0 } { if ($1 < min) min = $1; if ($1 > max) max = $1; bins[int(($1)/binwidth)]++; } END { for (i=0; i<=int(max/binwidth); i++) { start = i * binwidth; end = start + binwidth; if (bins[i] > 0) { printf "%d-%d: %d\n", start, end, bins[i] } } }'
0-25: 18828
25-50: 185
50-75: 36
75-100: 17
100-125: 2
125-150: 5
150-175: 2
300-325: 1
325-350: 3
650-675: 1

The biggest offender is jars/kotlin-stdlib-2.1.10/kotlin/collections/ArraysKt___ArraysKt.class with something like 1223 members (comes from this monster generated source file).

First we’ll try parallelism. That’s a good trick.

P_VALUES=(1 2 4 6 8 12 24)
for P in "${P_VALUES[@]}"; do
  START_TIME=$(gdate +%s%3N)
  cat sample.txt | xargs -n1 -P"$P" javap -c -sysinfo > /dev/null
  : gdate is GNU's latest date,
  : OSX's is out of date.
  END_TIME=$(gdate +%s%3N)
  DURATION=$((END_TIME - START_TIME))
  echo "$P,$DURATION"
done
P 1 2 4 6 8 12 24
dur_ms 270159 162260 113331 94999 77162 79271 84400

Looks like it is effective until around 8 processes. My Macbook Pro has 6 CPUs and 12 virtual cores. I could sip on my pipe in my study and ponder this. Or I could show you a better speedup!

Attempt Three, Zuzhing Xargs Further

Next we’ll try larger and larger batch sizes. That is we’ll pass longer and longer LISTS of files to a single invocation of javap.

N_VALUES=(4 64 256 1024)
P_VALUES=(1 2 4 8 12)
for P in "${P_VALUES[@]}"; do
  for N in "${N_VALUES[@]}"; do
    START_TIME=$(gdate +%s%3N)
    cat sample.txt | xargs -n"$N" -P"$P" javap -c -sysinfo > /dev/null
    END_TIME=$(gdate +%s%3N)
    DURATION=$((END_TIME - START_TIME))
    echo "$P,$DURATION"
  done
done
N dur_ms
1 270159
4 77407
64 9594
256 4491
1024 2403

Javap gets faster every second it is running. Here I have a cool experiment to measure the rate of improvement6.

(6) If you’re familiar with java at scale you won’t be surprised by the result. I’ll be recreating these graphs.
S_VALUES=($(seq 1 10 2000))
for S in "${S_VALUES[@]}"; do
  START_TIME=$(gdate +%s%3N)
  find jars -type f -name '*.class' | head -n "$S" | xargs -n"$S" javap -J-XX:+UnlockDiagnosticVMOptions -J-XX:+LogCompilation -J-XX:LogFile=/tmp/compiler.log > /dev/null
  END_TIME=$(gdate +%s%3N)
  DURATION=$((END_TIME - START_TIME))
  COMPILER_LINES=$(cat /tmp/compiler.log | wc -l)
  echo "$S,$DURATION,$COMPILER_LINES" >> times.csv
done

What’s going on? We are witnessing the power of Java’s JIT: it has tiered compilation7.

(7) How do I know? If we disable JIT with -J-Djava.compiler=NONE increasing batch size produces no speed improvements.

We can measure compiler activity by the number of log lines it produces8. And we can measure the speed of javap by dividing the count of class files decompiled by the total time. We’ll graph the two together. We start with a lot of compiler activity. We taper off as we approach 2000 class files. Mirroring this shape is our javap speed. How neat is that relationship?

(8) This just so happened to produce nice graphs. I got a little lucky.

(9) R^2 of 0.898 when we graph log lines and speed together. High school me would write highly correlated on the stats test.

Number of compiler lines and speed graphed together against number of classes processed Number of compiler lines vs speed
Sorry I’ve really squeezed as many pixels as I could out of these graphs. I’m trying to keep the byte count low so I can load it later on my dinky phone plan. I’ve linked the source data here.

Attempt Four, Mixing Xargs Args

N_VALUES=(1024 2000 4000 8000 16000 20000 30000 40000)
P_VALUES=(1 2 4 6 8 12)
for N in "${N_VALUES[@]}"; do
  for P in "${P_VALUES[@]}"; do
    START_TIME=$(gdate +%s%3N)
    find jars jars jars -type f -name '*.class' | xargs -n"$N" -P"$P" javap > /dev/null
    END_TIME=$(gdate +%s%3N)
    DURATION=$((END_TIME - START_TIME))
    echo "$N,$P,$DURATION"
  done
done
N/P 1 2 4 6 8 12
1024 116408 65166 50234 45763 44337 44127
2000 94860 52757 34322 34150 30540 29927
4000 69023 37320 27250 26806 25318 26006
8000 46223 27934 21434 16113 17168 17074
16000 35381 21787 15185 15966 15903 16175
20000 35059 21458 15338 15694 15942 16214
30000 43258 27232 17966 19023 18910 21210
4000010 43058 25605 18278 19514 19334 20203

(10) Out of curiosity because it appears in the table in xargs what happens when BATCH SIZE * P > N.

seq 1 60000 | xargs -n40000 -P12 python3 -c "import uuid
import sys
print(f\"{uuid.uuid4()} {len(sys.argv) - 1}\")
"
8172f612-a49c-42e6-bd6a-189c7e3ab80c 20000
a150ac10-0c7d-4ab0-a7e0-66659818cab6 40000

It will totally fill processes before moving onto giving arguments to the next invocation. Some processes won’t be started at all.

Looks like batch size 16000, and parallelism 4 is the ideal crunchwrap supreme10 combination.

(11) I’m using crunchwrap supreme as the pinnacle taco bell meal. I actually think their best item is the soft potato taco.

Attempt Five, Swapping Taco Bell for Baja Fresh

I saw all this and assumed packing 20000 files into a list of arguments would be slow. So slow, that with a single process I figured if I read the files in as a stream I could outspeed xargs. I will just call into the methods that javap is implemented with11. This isn’t taco bell programming anymore.

package io.github.math_ias;

import com.sun.tools.javap.Main;

import java.io.PrintWriter;
import java.io.InputStreamReader;
import java.io.BufferedReader;
import java.io.IOException;

import java.util.Arrays;

/**
 * Program that takes javap flags first,
 * followed by one file path at a time through standard input.
 */
public class MyMain {
  public static void main(String[] flags) {
    int flagsLength = flags.length;
    String[] args = Arrays.copyOf(flags, flagsLength + 1);
    PrintWriter writer = new PrintWriter(System.out);
    try (
      BufferedReader reader =
        new BufferedReader(new InputStreamReader(System.in))
    ) {
      String line;
      while ((line = reader.readLine()) != null) {
        args[flagsLength] = line;
        Main.run(args, writer);
      }
    } catch (IOException e) {
    e.printStackTrace();
    }
    System.exit(0);
  }
}

(12) This isn’t trivial by the way. I did it like this with an unammed module (drop --add-modules if you already have a module graph defined in a module-info.java).

javac --add-modules jdk.jdeps --add-exports jdk.jdeps/com.sun.tools.javap=ALL-UNNAMED io/github/math_ias/MyMain.java

You still add these exports when running with java.

java --add-exports jdk.jdeps/com.sun.tools.javap=ALL-UNNAMED io.github.math_ias.MyMain

Check out this comparison of the best value pair of batch size and parallelism and my program.

for _ in {1..5}; do
START_TIME=$(gdate +%s%3N)
find jars jars jars -type f -name '*.class' | xargs -n 40000 -P 4 javap > /dev/null
: OR find ../../jars ../../jars ../../jars -type f -name '*.class' | java --add-exports jdk.jdeps/com.sun.tools.javap=ALL-UNNAMED io.github.math_ias.MyMain > /dev/null
END_TIME=$(gdate +%s%3N)
DURATION=$((END_TIME - START_TIME))
echo "$DURATION"
done
Sample Xargs Time My Program Time
0 15857 14504
1 14964 14323
2 15506 15331
3 17200 15789
4 15913 14894
Average13 15888 14968.2
(13) pbpaste | awk '{sum+=$1} END {print sum/NR}'

It’s a small improvement. But not worth the time I spent writing the code. We will come back to the idea of calling into the libraries that power javap later.

Attempt Six, Je Ne Sais Quoi14

(14) Or when it comes to me as a developer. Maybe it would better read as Je Ne Sais Pas

I googled “how to make my java programs start faster”. I found this funky feature called AppCDS. It saves “class metadata” into a “JSA” file. The JVM loads classes from the JSA format faster than through class files or a JAR. The only blood sacrifice we have to make is giving up cross-platform-ness and file size.

First, I make the JSA file.

javap -J-XX:ArchiveClassesAtExit=my.jsa -c -p MyFibIterator.class

Then I use it.

N_VALUES=(1 4 16 64 256 1024 4096 16384 20000)
for n in "${N_VALUES[@]}"; do
  START_TIME=$(gdate +%s%3N)
  find jars -type f -name '*.class' | head -n "$n" | xargs -P1 -n"$n" javap -J-XX:SharedArchiveFile=my.jsa > /dev/null
  END_TIME=$(gdate +%s%3N)
  DURATION=$((END_TIME - START_TIME))
  echo "$n,$DURATION"
done

It’s alright. Looks like I save a constant amount of time. Maybe two seconds?

NO CDS AppCDS
-n1 272 225
-n4 285 238
-n16 351 302
-n64 539 493
-n256 956 886
-n1024 2363 2272
-n4096 6694 6382
-n4096 6694 6382
-n16384 23996 23366
-n20000 27250 25314

It reminds me of basil. It smells amazing, but when I cook with it the flavor disappears. Maybe I’m doing it wrong.

Attempt Seven, Native Image

What if we didn’t have to wait for tiered compilation? What if we compiled everything to machine code ahead of time? This is the idea behind GraalVM’s native image tool. The output looks nice.

% native-image com.sun.tools.javap.Main
========================================
GraalVM Native Image: Generating 'com.sun.tools.javap.main' (executable)...
========================================
[1/8] Initializing...
                                                                                (13.8s @ 0.09GB)
 Java version: 23.0.1+11, vendor version: GraalVM CE 23.0.1+11.1
 Graal compiler: optimization level: 2, target machine: x86-64-v3
 C compiler: cc (apple, x86_64, 16.0.0)
 Garbage collector: Serial GC (max heap size: 80% of RAM)
 1 user-specific feature(s):
 - com.oracle.svm.thirdparty.gson.GsonFeature
----------------------------------------------
Build resources:
 - 12.09GB of memory (75.6% of 16.00GB system memory, determined at start)
 - 12 thread(s) (100.0% of 12 available processor(s), determined at start)
[2/8] Performing analysis...  [*****]                                                                 (17.0s @ 0.56GB)
  5,302 reachable types   (75.0% of 7,071 total)
  6,253 reachable fields  (44.9% of   13,936 total)
   23,246 reachable methods (48.8% of   47,615 total)
  1,746 types,  15 fields, and   324 methods registered for reflection
    58 types, 57 fields, and  52 methods registered for JNI access
      4 native libraries: -framework Foundation, dl, pthread, z
[3/8] Building universe...                                                                            (2.1s @ 0.63GB)
[4/8] Parsing methods...    [*]                                                                     (1.8s @ 0.39GB)
[5/8] Inlining methods...   [***]                                                                   (1.4s @ 0.47GB)
[6/8] Compiling methods...  [****]                                                                  (16.7s @ 0.84GB)
[7/8] Laying out methods...   [**]                                                                    (3.3s @ 0.97GB)
[8/8] Creating image...     [**]                                                                    (3.2s @ 0.49GB)
   9.18MB (44.69%) for code area: 13,846 compilation units
  11.13MB (54.14%) for image heap:  140,509 objects and 60 resources
 245.58kB ( 1.17%) for other data
  20.55MB in total
----------------------------------------------
Top 10 origins of code area:                              Top 10 object types in image heap:
   6.51MB java.base                                         2.38MB byte[] for code metadata
   1.07MB svm.jar (Native Image)                            1.84MB byte[] for java.lang.String
 489.63kB jdk.compiler                                      1.32MB java.lang.String
 451.91kB jdk.jdeps                                         1.23MB java.lang.Class
 237.65kB jdk.zipfs                                       527.32kB heap alignment
 114.32kB java.logging                                    455.64kB com.oracle.svm.core.hub.DynamicHubCompanion
  69.53kB org.graalvm.nativeimage.base                    295.38kB byte[] for general heap data
  49.71kB jdk.proxy2                                      279.84kB java.util.HashMap$Node
  39.69kB jdk.proxy1                                      269.21kB java.lang.String[]
  26.75kB jdk.internal.vm.ci                              237.15kB java.lang.Object[]
  61.46kB for 8 more packages                               2.33MB for 1344 more object types
----------------------------------------------
Recommendations:
 HEAP: Set max heap for improved and more predictable memory usage.
 CPU:  Enable more CPU features with '-march=native' for improved performance.
----------------------------------------------
                      3.2s (5.2% of total time) in 877 GCs | Peak RSS: 1.44GB | CPU load: 6.81
----------------------------------------------
Build artifacts:
 [...]/com.sun.tools.javap.main (executable)
==============================================
Finished generating 'com.sun.tools.javap.main' in 1m 0s.

Unfortunately, it looks like I’ve made a bit of a “Taco Bell Beefer Burger” here. A recipe not in my comfort zone.

% ./com.sun.tools.javap.main
Exception in thread "main" java.lang.ExceptionInInitializerError
  at jdk.compiler@23.0.1/com.sun.tools.javac.file.BaseFileManager.createLocations(BaseFileManager.java:126)
  at jdk.compiler@23.0.1/com.sun.tools.javac.file.BaseFileManager.<init>(BaseFileManager.java:84)
  at jdk.compiler@23.0.1/com.sun.tools.javac.file.JavacFileManager.<init>(JavacFileManager.java:162)
  at jdk.jdeps@23.0.1/com.sun.tools.javap.JavapFileManager.<init>(JavapFileManager.java:46)
  at jdk.jdeps@23.0.1/com.sun.tools.javap.JavapFileManager.create(JavapFileManager.java:57)
  [...]

Attempt 8, Maybe I Do Know Quoi

I brainstormed ways to go faster.

Here’s what I came up with. Straight off my ape brain.

package io.github.math_ias;

import java.io.IOException;

import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.InvalidPathException;

import java.lang.classfile.ClassFile;
import java.lang.classfile.ClassModel;
import java.lang.classfile.MethodModel;

import java.lang.classfile.attribute.CodeAttribute;

import java.lang.classfile.constantpool.PoolEntry;
import java.lang.classfile.constantpool.ClassEntry;
import java.lang.classfile.constantpool.MethodRefEntry;
import java.lang.classfile.constantpool.InterfaceMethodRefEntry;

import java.lang.constant.MethodTypeDesc;

import java.util.Iterator;

// Compile like ...
// javac --enable-preview -source 23 src/io/github/math_ias/MyMain.java -d target
// Run like ...
// java --enable-preview -cp target io.github.math_ias.MyMain 'java/math/BigInteger' 'valueOf' '(J)Ljava/math/BigInteger;'

/**
 * Program that does a whole lot of things.
 */
public class MyMain {
  public static void main(String[] args) {
    if (args.length != 4) {
      System.err.println("Expected exactly 4 arguments, the root directory, the class name (/'s), the method name, and the method type descriptor.");
      System.exit(-1);
    }
    Path rootPath = null;
    try {
      rootPath = Path.of(args[0]);
    } catch (InvalidPathException ipe) {
      System.err.println("Expected first arg to be a valid root path (does not parse).");
      ipe.printStackTrace(System.err);
      System.exit(-1);
    }
    if (!Files.exists(rootPath)) {
      System.err.println("Expected first arg to be a valid root path (does not exist).");
      System.exit(-1);
    }
    String classToMatch = args[1];
    String methodToMatch = args[2];
    MethodTypeDesc methodTypeDescToMatch = null;
    try {
      methodTypeDescToMatch =
        MethodTypeDesc.ofDescriptor(args[3]);
    } catch (IllegalArgumentException e) {
      System.err.println("Expected fourth arg to be a valid method type descriptor.");
      e.printStackTrace(System.err);
      System.exit(-1);
    }
    // To quiet javac on the lambda. :]
    MethodTypeDesc effectivelyFinalValue =
      methodTypeDescToMatch;

    try {
      Files.walk(rootPath)
      .parallel()
      // This lambda is likely repeating work Files.walk already does,
      // plus this file extension business is nasty,
      // but I'm betting that I can beat xargs without optimizing it.
      .filter((Path path) ->
        !Files.isDirectory(path) &&
        path.getFileName().toString().endsWith(".class")
      )
      .forEach((Path path) -> printOnPathMatch(
        path
      , classToMatch
      , methodToMatch
      , effectivelyFinalValue
      ));
    } catch (IOException ioe) {
      System.err.println("Unexpected error occurred while traversing file tree.");
      ioe.printStackTrace();
      System.exit(-1);
    }

    System.exit(0);
  }

  public static void printOnPathMatch(
    Path path
  , String classToMatch
  , String methodToMatch
  , MethodTypeDesc methodTypeDescToMatch
  ) {
    try {
      ClassModel classModel =
        ClassFile.of()
          .parse(path);
      if (classModelMatches(
        classModel, classToMatch, methodToMatch, methodTypeDescToMatch
      )) {
        System.out.println(path.toString());
      }
    } catch (IOException io) {
      System.err.println(String.format("Failed to read path %s, skipping.", path.toString()));
      io.printStackTrace(System.err);
    }
  }

  public static boolean classModelMatches(
    ClassModel classModel
  , String classToMatch
  , String methodToMatch
  , MethodTypeDesc methodTypeDescToMatch
  ) {
    Iterator<PoolEntry> iterator = classModel.constantPool().iterator();
    while (iterator.hasNext()) {
      PoolEntry poolEntry = iterator.next();
      if (poolEntry instanceof MethodRefEntry) {
        MethodRefEntry methodRefEntry = (MethodRefEntry) poolEntry;
        if (
          methodRefEntry.name().equalsString(methodToMatch) &&
          methodRefEntry.typeSymbol().equals(methodTypeDescToMatch) &&
          methodRefEntry.owner().name().equalsString(classToMatch)
        ) {
          return true;
        }
      } else if (poolEntry instanceof InterfaceMethodRefEntry) {
        InterfaceMethodRefEntry interfaceMethodRefEntry =
          (InterfaceMethodRefEntry) poolEntry;
        if (
          interfaceMethodRefEntry.name().equalsString(methodToMatch) &&
          interfaceMethodRefEntry.typeSymbol().equals(methodTypeDescToMatch) &&
          interfaceMethodRefEntry.owner().name().equalsString(classToMatch)
        ) {
          return true;
        }
      }
    }
    return false;
  }
}

Let’s measure it!15

for _ in {1..5}; do
START_TIME=$(gdate +%s%3N)
: My program only takes one root dir.
: So for fairness I run it 3 times.
: This will eliminate some warmup effects.
: But as you’ll see in the results,
: it’s so fast it doesn’t matter.
java --enable-preview io.github.math_ias.MyMain ../../jars 'java/math/BigInteger' 'valueOf' '(J)Ljava/math/BigInteger;' > /dev/null
java --enable-preview io.github.math_ias.MyMain ../../jars 'java/math/BigInteger' 'valueOf' '(J)Ljava/math/BigInteger;' > /dev/null
java --enable-preview io.github.math_ias.MyMain ../../jars 'java/math/BigInteger' 'valueOf' '(J)Ljava/math/BigInteger;' > /dev/null
END_TIME=$(gdate +%s%3N)
DURATION=$((END_TIME - START_TIME))
echo "$DURATION"
done

(15) We can also measure fairness. For each file in the parallel stream let’s just print the file name.

java --enable-preview io.github.math_ias.MyMain ../../jars | wc -l
19080

We’re not skipping files by accident. What about accuracy?

find . -type f -name '*.class' | xargs -n 50000 -P 4 javap -sysinfo -c | rg "java/lang/Object\.equals:\(Ljava/lang/Object;\)Z|^Classfile" | python3 -c "
import sys, itertools
a, b = itertools.tee(sys.stdin)
next(b, None)
for x, y in zip(a, b):
if 'Classfile' in x and 'invokevirtual' in y.lower():
print(x, end='')
" | wc -l
571

That gives a similar count to my program: 587. I suspect the classfile spec allows you to add methods to the constant pool and not use it.

It does beg the question, what IS the most used method in all these jars? Let’s go find out in another page.

It’s FAST. It is about 10 times faster than the fastest method we’ve come up with so far (14968 versus 1533 milliseconds).

SAMPLE 0 1 2 3 5 AVG
TIME 1528 1548 1491 1536 1560 1532.6

And it only took me a couple of months to do it and then write about it!

Further Reading

There are two JVM implementations that have their own solutions to slow java startup times. I didn’t try them and perhaps they are another way to crack this problem.