JIT-Repicker: Differential Fuzzing for Modern JS Engines

If you're anything like me, the phrase "JIT compilation" probably evokes a sense of fear similar to the feeling others get when they hear the term "calculus". The last time in my life this phrase was being thrown around was in my Compilers module at university, along with a series of related fear-inducing terms like "intermediate languages" and "the Java Virtual Machine". Thankfully though, enough time has passed for me now that I've dealt with my traumas, and am in a position where I'm ready to go back down the rabbit hole in the context of my own interests. The goal here is to understand the concept of JIT in browsers, and how we can find bugs in it.

Background

JIT (Just-In-Time Compilation) is a compilation process in which code is translated from an intermediate representation or a higher-level language (e.g., JavaScript or Java bytecode) into machine code at runtime, rather than prior to execution. This approach combines the benefits of both interpretation and ahead-of-time (AOT) compilation.1

The concept of JIT compilation is actually fairly self-explanatory. A JIT compiler executes at the same time as the running program, and it will identify code sections that run frequently and compile them into machine code just in time for them to be executed again.

This concept is used aggressively in browsers with JavaScript code, since websites need to be fast. In fact, billions of users (and dollars) ride on websites being fast, and without JIT, we'd be stuck with an interpreted language running the web, and nobody wants to live in that world. That's the long and short of why JIT compilation exists in the browser, combined with the fact that we'd have far less (secure) websites if web developers had to write C++.

JIT Optimizations

So we know that JS code in the browser is constantly being optimized by the JIT compiler, but how exactly does this happen? Effectively, the JIT compiler constantly profiles running code, building up a sort of "hot and cold" call stack. As soon as a section of code becomes "hot" (read: executed many times in the same way), it compiles that section to native code so the next runs on that section will be significantly faster. Consider this snippet of code:

function add(a, b) {
  return a + b;
}

function time(label, fn) {
  const t0 = performance.now();
  fn();
  const t1 = performance.now();
  console.log(`${label}: ${(t1 - t0).toFixed(3)} ms`);
}

// 1. "Cold" run – JIT hasn't had much profiling info yet
time("Cold run", () => {
  for (let i = 0; i < 5_000_000; i++) {
    add(i, i + 1);
  }
});

// 2. Warmup – we call it many times so the JIT gathers type info
for (let i = 0; i < 50; i++) {
  for (let j = 0; j < 100_000; j++) {
    add(j, j + 1);
  }
}

// 3. "Hot" run – now the function is likely optimized by the JIT
time("Hot run", () => {
  for (let i = 0; i < 5_000_000; i++) {
    add(i, i + 1);
  }
});

If you run this code in a browser, you'll likely notice it:

[Log] Cold run: 101.000 ms
[Log] Hot run: 63.000 ms

The JIT compiler optimizes the hot run to the point where we see an almost 40% speed-up. Your mileage will vary, but the direction is consistent, and it's pretty cool, but how does it do this? Well, I'm going to spare you my late-teen trauma and just summarise this as best I can.

Compilers Crash Course

The JIT tries to turn "dynamic, flexible JS" into "tight, predictable machine code". The core concept is that there are a bunch of tricks that can make code run faster. Most of these tricks rely making assumptions about the code to cut unnecessary sections of code at various levels of abstraction. JS is a heavily dynamic language, but in practice when you write an add function - chances are you're only going to be passing in numbers:

function add(a, b) { return a + b; }

If the engine sees add is always called with numbers, it can compile a specialized version: "always take two doubles, add them, return", allowing it to skip the generic "what type is this?" checks before each call. However, if you later call add("1", "2"), that breaks the assumption, the JIT compiler may be forced to de-optimize that call site or compile a slower more generic path.

Another example is property access. Given this code:

function getName(user) {
  return user.name;
}

// Every object has the same "shape": { name, age }
let users = [
  { name: "Alice", age: 30 },
  { name: "Bob",   age: 25 },
  { name: "Carol", age: 28 },
];

for (let i = 0; i < 1_000_000; i++) {
  getName(users[i % 3]);
}

Accessing user.name is actually quite slow in JS (since objects are dynamic and mutable) so the in-memory representation of an object is not always the same. This means that the memory offset from the user base address to the name field may change. However, if the JIT sees that getName is always called with objects of the same shape (same properties, in the same order), it can cache that offset and emit machine code that reads directly from a fixed memory position without doing a dictionary lookup. This is called the inline cache, and it's one of the biggest wins in modern JS engines. If we break the assumption by passing in { age: 40, name: "Dave" } where the properties are in a different order, the engine falls back to the slow generic path.

These are just two examples, but the pattern is always the same: the JIT bets on your code being predictable, and when it's right, everything flies. The interesting question is what happens when those bets go wrong in ways the engine doesn't catch.

Anatomy of a JIT Bug

Okay, so before I lose you - I promise we're going somewhere with this. My main message so far is that JIT compilation exists in an aggressive form in the browser to make websites faster. It does this by profiling running JS and performing tricks to speed up execution at runtime. However, these optimizations introduce complexity, and with complexity comes attack surface, and with attack surface comes exploits - and that's what we're going to be looking at.

Look at the following code:

function miscompute(n) {
	n |= 0;
	if (n < 0) {
		let v = (-n) | 0;
		return Math.abs(n); // miscomputation here
	}
}

Now, imagine there is a weird (Common-Subexpression-Elimination failure2) logic bug in a JIT optimizer that results in Math.abs occasionally returning a negative number at runtime. On its own this is a logic flaw, and no-doubt an incredibly difficult one to find. However, it's possible to turn a logic flaw like this into memory corruption:

function oobBug(arr, n) {
  n |= 0;
  if (n < 0) {
    let v = (-n) | 0;
    // miscomputation here may return negative number
    let idx = Math.abs(n);
    // following code triggers a segfault
    // due to OOB access
    if (idx < arr.length) {
      arr[idx]; /* Safe? Read? */
    }
  }
}

In this case, the JIT optimizer will see that Math.abs always returns a positive value and smaller than the length of the array, therefore the bounds-check can be eliminated: leading to an out of bounds read when Math.abs returns a negative number. This is a real bug, identified as CVE-2020-9802 in WebKit3 4, and captures the essence of JIT exploitation. I won't go into much more detail on this exact issue, I implore you to read the original post by Samuel Groß, where he even goes into details of how he turned the OOB read into full remote code execution. We may touch on that in the future but baby steps first.

CVE-2020-9802 is just one bug in one engine, but every major JS engine (V8, SpiderMonkey, JavaScriptCore) ships its own JIT pipeline with its own optimization passes, and each of those passes is a fresh opportunity for the same class of mistake. New optimizations land constantly, old ones get refactored, and every change is a chance to introduce a miscompilation that turns a "safe" bounds check into a no-op. The question isn't whether more bugs like this exist, it's how we find them before someone else does.

Into the Fire

Now we have an idea of what a JIT bug looks like and how it can become memory corruption, we should probably look at how you'd go about finding bugs like this. The trouble is, these bugs are uniquely resistant to the usual approaches. Code review? The optimization passes in a modern JS engine span hundreds of thousands of lines of highly specialized C++, and the bugs only manifest when a particular sequence of optimizations fires on a particular pattern of input: good luck spotting that by eye. Traditional fuzzing? Most fuzzers look for crashes, but a JIT miscompilation doesn't necessarily crash. It silently produces a wrong value, which might only matter if you happen to check for it. Sanitizers like ASan or UBSan help with memory errors, but a JIT bug often is the sanitizer's blind spot: the generated machine code is "correct" from the compiler's perspective, it just doesn't match what the JavaScript semantics guarantee. So we need a different signal entirely: not "did it crash?" but "did it compute the right thing?" Well I sure as hell don't have the patience nor sanity to do a full code review of the JIT compiler's source code. No, when it comes to this level of complexity, I'll do what all good "security professionals" do: reach for a tool!

Unfortunately there isn't quite a sqlmap for JIT exploits just yet, but there has been some exceptional academic research in the space of JIT fuzz testing in recent years. And when you think about it, fuzzing is the perfect approach here: if JIT bugs only surface under specific runtime conditions, why not automatically generate millions of those conditions and watch for anomalies? These two papers5 6 make up the foundation for what we'll be discussing in this series: differential fuzzing, but there are many more that take different approaches.

Differential Fuzzing

If you're not aware of what fuzzing is, here's the TL;DR:

Generate millions of "random" (or more often, guided) inputs to a given program, and evaluate the result for something weird.

That's obviously not the academic definition but that's an easy way of looking at it. The "something weird" in most cases is a crash: pretty much all of fuzz testing historically was to look for cases where a memory error or segmentation fault occurred. However, in the modern age of fuzzing this has become less common as we now deal with a lot more memory safe languages, and in some cases a memory corruption may occur without a crash. In these situations, we have to rely on some other signal of "weirdness", and these signals come from a piece of code often called a "sanitizer" or if they are generic an "oracle".

An oracle or sanitizer is literally just some code that lives either inside or adjacent to the program being tested and looks for a behaviour that is not expected. This can be "variable X should always be positive", or "array X should never be accessed beyond its size", or even "webpage X should never have access to webpage Y's data". It's some condition that should never change, often referred to as an invariant.

So now consider this: if we were to write an oracle for something weird happening inside the JIT compiler, how would we do it? Well, the most obvious way would be to:

Run some JS in interpreted mode
Run that very same JS with JIT enabled
Check if the outputs are the same

This is the fundamental concept behind differential fuzzing: we do something twice on the same inputs and see if they differ. There are lots of types of differential fuzzing, but this is what we care about. Simple right? Well, not really. There are a bunch of challenges we must first solve, many of which have already been solved before (shoulders of giants and all that).

JS Test Generation

This is probably the easiest from our perspective because it's been done well for years now. Generating JS based fuzzing inputs is a difficult problem, especially when you want to do coverage guided fuzzing where the fuzzer mutates and discards test cases based on what code branches it has already covered. The wonderful folk at Google Project Zero built a solution to this problem that has become the State of the Art: Fuzzilli7. If you want to learn how this works under the hood, you can read the original paper8, the concept is built on using an intermediate language (yes, that fearful term I mentioned in the intro) to produce unique test cases when those IL representations are "lifted" into the higher level language (JavaScript). All we care about here though is that it allows us to generate great JS test cases for our fuzzer, and do some other fun things like intercept calls within the JS engine while it's running and enforce or disable JIT.

The main things we want to modify about Fuzzilli for now are that we want to be able to run the JS engine twice: once with JIT enabled, and once without. Then we want to be able to compare the outputs of those runs to check whether they differ.

Differential Checks

The really difficult part here is implementing the ability to compare outputs without affecting the outputs: it's a bit like the observer effect. By injecting code into our tests, we may inadvertently affect the JIT decisions so we need to ensure a really high level of determinism, which means identifying and remediating as much non-deterministic behaviour as possible to avoid false positives.

I'll walk through JIT-Picker's approach here, because it informed my own implementation, even though the original code is no longer usable.

Engine-side hash accumulation

Here's the trick: we don't want to observe from JavaScript, because JavaScript is the thing we don't trust. That's like asking the suspect to grade their own polygraph. Instead, we patch the JS engine itself to expose a native fuzzilli_hash(id, value) function which:

Lives in C++ inside the engine (jsc.cpp)
Accumulates a running hash into a global variable (Fuzzilli::execHash) that is invisible to JavaScript
Returns undefined so it has no observable side effects from JS's perspective
Is type-aware: it tags integers, doubles, null, undefined, symbols, BigInts, and objects differently before hashing

The hash function itself is simple, a shift-and-add accumulator over (type, value) pairs. The goal is just to hash outputs so we can easily identify when different execution traces produce different hashes.

You can think of this like a checksum: we don't care what the bytes are, we just want to know whether what we got matches what we expected. We do the same thing here with execution: we tally up every value the program produces into a single fingerprint, then compare fingerprints between the JIT and non-JIT runs. If they match, the JIT behaved correctly. If they don't, something was miscompiled.

Transparent probe injection

Now that we have a hash function, we need to figure out where to run it. We can create a post-processor that inserts fuzzilli_hash() calls into generated FuzzIL (the intermediate language of Fuzzilli) programs before they're lifted to JavaScript. It uses two strategies:

End-only probing: hash up to 10 variables at the end of the program. Simple, but only catches bugs that affect final values.
Weave probing: hash intermediate values throughout execution, targeting operations most likely to trigger JIT bugs: arithmetic, function calls, property access, type conversions (the core optimizations we discussed at the beginning). The JIT-Picker paper found this is 10²–10³× more effective at finding differentials than end-only probing.

To put this plainly: if end-only probing is checking whether a patient survived surgery, weave probing is monitoring their vitals throughout. Unsurprisingly, the latter catches more problems.

REPRL protocol extension

Once we have the hash function calling either via end-only or weave probing, we need to figure out how to get the hashes out into our fuzzer to inform our oracle detection. To do this, we leverage part of Fuzzilli: the REPRL protocol.

We've not discussed what REPRL9 is yet, so here's the short version. If you've ever typed python3 into your terminal and gotten that little >>> prompt, you've used a REPL: Read, Eval, Print, Loop. REPRL is that, but with a Reset jammed in the middle because we don't want leftover state contaminating our results. Think of it as a REPL with amnesia. Instead of launching, running one line of code and exiting, the engine process sits in a loop like so:

Read: Wait for the fuzzer to send a script over a pipe/shared memory.
Eval: Execute the script.
Print: Send back stdout/stderr and the exit status.
Reset: Wipe the global JS state (clear global variables, reset heap state as much as possible) so the next script starts clean.
Loop: Go back to step 1.

In our case, the hash we computed in the last step needs to get back to the fuzzer somehow. We can extend the REPRL protocol to return extra data: status code + exec_hash. The fuzzer then:

Executes the script on the JIT runner (normal JSC flags): gets Hash₁
Executes the same script on the reference runner (all JIT tiers disabled: gets Hash₂
If Hash₁ ≠ Hash₂: we found a differential

Finally, we need to remove any non-determinism in case it results in false positives. We lobotomize Math.random() to always return 0.5 and Date.now() to always return 0. This allows us to get consistent results regardless of what JS is produced by the test generator.

The Full Story

Okay, by this point all I've done is try to distill the JIT-Picker paper down into an easier format because our goal here is not actually to use JIT Picker. Why? Because it was written almost 5 years ago and most of its core functionality cannot be used anymore since Fuzzilli and the JS engines have changed so much since its release (and it's not been maintained). I initially tried to just do a big fat git merge, but the branches were so far apart that it wasn't a feasible solution. Instead, I took it on myself to fully understand the team's implementation, and write it from scratch on top of the latest Fuzzilli head as of February 2026. Unfortunately, my solution still falls victim to the same problem: if it's not maintained, it loses potency and will no longer be able to find bugs in JS engines, but I'm not doing this for a one-off research paper. My goal is to actually run this fuzzer continuously to help me find bugs.

In an upcoming blog I'll release this on GitHub, right now it needs several weeks of testing as I have no idea yet if it will even find any bugs.

Conclusion

If you stayed till here I'm impressed, I'll give you my main take-aways: JIT compilers are wizardry, JIT bugs are nightmares, and differential fuzzing is our best shot at catching those nightmares before they become exploits. It's an arms race between browser vendors optimizing for speed, and security researchers trying to break those optimizations before attackers do. I've built this JIT fuzzer as my contribution to this never-ending cycle of optimization and exploitation, though I'm under no illusion that it won't be outdated the moment someone refactors a critical piece of Fuzzilli or JavaScriptCore.