JVM Memory Fundamentals: Stack, Heap, and Object Headers
Every Java developer knows objects live on the heap and local variables on the stack.
But when you create a simple Point object with two integers, do you know exactly
how much memory it consumes? If you guessed “8 bytes for the two ints,”
you’re off by 200%.
Your microservice crashes with “only” a million objects. GC pauses freeze traffic at the worst moment. Java 25 promises up to 30% CPU savings and ~22% heap reduction with one flag, but only if you understand what it’s doing. JVM memory fundamentals connect all three.
This is Part 1 of our JVM Memory series. We’ll explore the fundamental concepts every Java developer should know. Part 2 dives into the bit-level details for those who want to understand the JVM as deeply as the CPU understands it.
Part I: The Two Realms - Stack and Heap
The City Beneath Your Code
Imagine your Java application as a bustling metropolis. The Heap is the residential district-sprawling, shared by all citizens, constantly being cleaned by sanitation trucks (the Garbage Collector). The Stack is the business district-each building serves exactly one purpose, is demolished the moment its work is done, and is private to its owner.
Every thread in your application gets its own Stack-vertical, efficient, self-contained. The Heap is the shared commons where all threads store their objects, references crossing between threads like citizens visiting shared facilities.
The Stack: Where Methods Live and Die
The Stack is perhaps the most elegant data structure in computing. Last In, First Out. When you call a method, a stack frame is pushed. When the method returns, it’s popped. No garbage collection, no memory leaks - just automatic, deterministic cleanup.
Each thread gets its own stack, sized at JVM startup (-Xss flag, default 1MB).
When a thread exceeds this limit? StackOverflowError.
You’ve seen it - infinite recursion, overly deep call chains.
public class StackOverflowDemo {
public static void main(String[] args) {
recursiveMethod(1);
}
private static void recursiveMethod(int depth) {
System.out.println("Depth: " + depth);
recursiveMethod(depth + 1); // Eventually: StackOverflowError
}
}
The stack stores primitives and references.
The heap stores everything those references point to.
Simple distinction, massive implications for memory and GC.
The Heap: The Object Repository
If the Stack is where methods execute, the Heap is where objects reside.
Every time you write new Object(), the JVM allocates space on the heap.
Unlike the stack, heap memory isn’t automatically reclaimed when a method
returns - that’s the Garbage Collector’s job.
The heap is shared across all threads. Your main thread creates an object;
your worker thread can reference it. This sharing is powerful but requires
synchronization when mutable objects are accessed concurrently.
Memory Error Types:
- StackOverflowError: Stack too deep (infinite recursion, deep call chains)
- OutOfMemoryError: Java heap space: Heap exhausted (memory leaks, insufficient size)
- OutOfMemoryError: Metaspace: Class metadata exhausted (too many classes loaded)
Part II: Stack Frames - The Method’s Workspace
Anatomy of a Frame
Every method invocation creates a stack frame - a self-contained execution environment. The frame contains everything the method needs to execute: its local variables, its computation workspace, and information about where to return when done.
The key insight: The compiler (javac) knows exactly how much memory each
frame needs before the program even runs.
When javac compiles your source code, it analyzes every method to determine:
max_locals: How many slots the local variable array needs (parameters + local variables, accounting for slot reuse)max_stack: The deepest the operand stack will grow during execution
These values are stored in the .class file’s Code attribute.
The JVM reads them when loading the class, so when a method is called,
it knows precisely how much memory to allocate - no guesswork, no waste.
The Local Variable Array
The local variable array is indexed storage for everything local to the method:
- Slot 0:
thisreference (for instance methods only) - Slots 1-N: Method parameters, in declaration order
- Subsequent slots: Local variables declared in the method body
Important: long and double occupy two consecutive slots. Everything else
(int, Object reference, byte, short, char, boolean, float) occupies one slot.
The compiler is smart about reusing slots. If you have:
public void example() {
int x = 1; // Uses slot 1
// ... code using x ...
int y = 2; // Can reuse slot 1 if x is no longer needed
}
javac analyzes variable lifetimes and reuses slots when possible, minimizing max_locals.
The Operand Stack
Bytecode works indirectly. Want to add two numbers?
Load them onto the operand stack first. Execute iadd. Store the result back.
Local variables provide storage; the operand stack provides the workspace
where every instruction executes.
Consider this simple addition:
// Java source:
public int add(int a, int b) {
return a + b;
}
// Bytecode (simplified):
// 0: iload_1 // Push local variable 1 (a) onto stack
// 1: iload_2 // Push local variable 2 (b) onto stack
// 2: iadd // Pop two ints, add, push result
// 3: ireturn // Pop result, return to caller
The operand stack starts empty. iload_1 pushes a. iload_2 pushes b.
Now the stack is [a, b] (b on top). iadd pops both, adds them, pushes the result.
Stack is now [result]. ireturn pops and returns it.
The compiler simulates this execution during compilation to find the maximum
stack depth - the max_stack value stored in the .class file.
Dynamic Linking and Exception Handling
Every frame contains a reference to the runtime constant pool of the current class.
When your code calls another method (obj.doSomething()),
the bytecode contains a symbolic reference.
The JVM resolves this to the actual method address - dynamic linking.
Once resolved, the JVM caches the direct reference for subsequent calls.
The exception table maps ranges of bytecode (try blocks) to exception handlers (catch blocks). Each entry contains:
- Start PC, End PC: The protected code range
- Handler PC: Where to jump if exception occurs
- Catch type: The exception class to catch
When an exception is thrown, the JVM searches the current frame’s exception table. If no match, it pops the frame and checks the caller’s table - unwinding the stack until a handler is found or the thread terminates.
Key Takeaway: Every method call creates a frame with dedicated space for data and computation. The compiler determines exact memory requirements (
max_localsandmax_stack) ahead of time, so the JVM allocates precisely what’s needed.
Part III: Object Headers - The Hidden Cost
The Header Tax
Create a simple object:
class Point {
int x;
int y;
}
Expected: 8 bytes
Traditional JVM: 24 bytes
Every object carries 12 bytes of header metadata:
| Field | Size | Contents |
|---|---|---|
| Mark Word | 8 bytes | Hash, lock state, GC age |
| Class Pointer | 4 bytes | Reference to class metadata |
Additionally, alignment padding (4 bytes) may be inserted before the first field to ensure 8-byte alignment. The combined header plus padding costs 2× the actual data for small objects.
Java 25 changes the math. By embedding a 22-bit class index into unused Mark Word bits, the JVM compresses the header from 12 bytes to 8 bytes:
| Layout | Header Size | Point Object | Savings |
|---|---|---|---|
| Traditional | 12 bytes | 24 bytes | - |
| Compact (Java 25) | 8 bytes | 16 bytes | 33% |
For arrays, the length field (4 bytes) forces 8-byte alignment regardless,
so headers stay at 16 bytes. int[10] remains 56 bytes in both cases -
compact headers mainly benefit small objects without array overhead.
Enable with: -XX:+UseCompactObjectHeaders
Seeing It In Action
Let’s use Java Object Layout (JOL) to inspect actual object sizes:
import org.openjdk.jol.info.ClassLayout;
import org.openjdk.jol.vm.VM;
public class ObjectHeaderDemo {
public static void main(String[] args) {
System.out.println(VM.current().details());
// Simple object with two int fields
class Point {
int x;
int y;
}
Point p = new Point();
System.out.println("=== Point Object Layout ===");
System.out.println(ClassLayout.parseInstance(p).toPrintable());
// Array header with length field
int[] intArray = new int[10];
System.out.println("\n=== int[10] Array Layout ===");
System.out.println(ClassLayout.parseInstance(intArray).toPrintable());
// Demonstrate hash code storage
Object o = new Object();
System.out.println("\n=== Before hashCode() ===");
System.out.println(ClassLayout.parseInstance(o).toPrintable());
int hash = o.hashCode();
System.out.println("\n=== After hashCode(): " + hash + " ===");
System.out.println(ClassLayout.parseInstance(o).toPrintable());
}
}
Expected output analysis:
- Point object: 12 bytes header + 4 bytes alignment padding + 8 bytes fields = 24 bytes (traditional), 16 bytes (compact)
- Bare Object: 12 bytes header + 4 bytes padding = 16 bytes (traditional), 8 bytes (compact) - the most dramatic single-object reduction
- int[10] array: 12 bytes header + 4 bytes length + 40 bytes data = 56 bytes (already 8-byte aligned, no extra padding needed)
- Hash code: Notice how the Mark Word changes after
hashCode()is called - the hash is computed once and cached in the header
The Mark Word: A Multi-Purpose Chameleon
The 64-bit Mark Word is remarkably versatile. It stores:
- Identity hash code (31 bits): Computed lazily on first
hashCode()call - GC age (4 bits): Number of GC cycles survived (max 15 before tenuring)
- Lock state (3 bits): Unlocked, lightweight locked, heavyweight locked, marked for GC
The same 64 bits serve different purposes depending on the object’s state.
When you call hashCode(), bits that were unused suddenly store your hash.
When you synchronized(obj), the bits transform to hold lock information.
During GC, they become forwarding pointers.
Key Takeaway: Object headers (12 bytes) plus alignment padding create 16 bytes of overhead minimum - often consuming 50-67% of memory for small objects. In applications creating millions of small objects (microservices, caches, JSON parsing), this overhead compounds into gigabytes of wasted RAM.
Part IV: Garbage Collection - The Rhythm of Object Lifetimes
Most Objects Die Young
Garbage collection in the JVM is built on a simple observation:
The vast majority of objects live brief, intense lives.
Your request handlers, your temporary calculation intermediates,
your StringBuilder instances - they sparkle into existence and disappear
within milliseconds.
This is the Weak Generational Hypothesis, and it shapes every modern JVM garbage collector.
The Heap is divided into generations:
- Young Generation: Where objects are born. Contains Eden space (new allocations) and two Survivor spaces (aging objects)
- Old Generation (Tenured): Where long-lived objects retire
Young Generation Collections (Minor GC)
When Eden fills up, the JVM triggers a Minor GC:
- Stop-the-world pause (brief - typically milliseconds)
- Identify live objects: Starting from GC roots (stack references, static fields), traverse the object graph
- Copy survivors: Live objects in Eden are copied to Survivor space (S0 or S1)
- Empty Eden: Dead objects are simply abandoned - no sweeping needed
- Increment age: Objects’ GC age incremented in their Mark Word
The copying collector is efficient because most objects are already dead. If 90% of Eden is garbage, we only copy 10% - much faster than sweeping everything.
The Survivor Spaces Dance
The two Survivor spaces (S0 and S1) alternate roles:
- One holds objects from the previous collection
- The other is empty, ready to receive new survivors
- After each Minor GC, they swap identities
Objects that survive enough Minor GCs (default threshold: 15, stored in 4 Mark Word bits) are tenured - promoted to the Old Generation.
Old Generation Collections (Major/Full GC)
When does Major GC trigger?
Unlike the predictable Eden-based Minor GC, Major GC starts when:
- Old Generation fills up - An allocation request in Old Gen cannot be satisfied
- Promotion fails - Survivor objects cannot be promoted to Old Gen due to insufficient space
- Explicit request -
System.gc()(though modern JVMs may ignore this) - Metaspace pressure - Class metadata exhaustion triggers Full GC to unload classes
Major GC is far less predictable than Minor GC. While Eden fills at a steady rate based on allocation velocity, Old Gen consumption depends on object lifetime patterns that vary by workload.
The Old Generation uses different algorithms because:
- Objects here are expected to live long
- Collection is infrequent but more expensive
- Fragmentation is a real concern
Different collectors handle Old Gen differently:
- Serial/Parallel: Mark-Sweep-Compact (identify live, remove dead, slide live together)
- CMS: Mostly concurrent mark-sweep (no compaction - fragmentation occurs)
- G1: Incremental mixed collections (collects some old regions with young)
- ZGC/Shenandoah: Concurrent mark-relocate (move objects while app runs)
GC Roots: Where Collection Starts
Garbage collection doesn’t start from random objects - it starts from GC Roots, objects that are definitely alive:
- Stack references: Local variables and parameters on all thread stacks
- Static fields: Class static variables
- JNI references: Native code references
- Monitors: Objects used as synchronization monitors
The GC traverses from these roots, following references, marking everything reachable as live. Unmarked objects? Garbage.
Key Takeaway: GC leverages object lifetime patterns - most objects die young, so optimize for that with fast, copying Minor GCs. Long-lived objects graduate to Old Generation, where more sophisticated (but slower) collectors manage them.
Part V: The Evolution of Garbage Collection
From Seconds to Milliseconds
The JVM’s garbage collection story follows a clear trajectory: pause times have dropped from seconds to milliseconds over three decades.
1996-1998: The Foundation
Java 1.0 launched with simple Mark-and-Sweep - basic automatic memory management, but pause times grew linearly with heap size. Java 1.2 introduced Serial and Parallel collectors and established the generational hypothesis: most objects die young, so optimize for that with separate young and old generations. This insight remains fundamental to every modern collector.
2004-2014: Reducing Pauses
CMS (Java 5-7) was the first mostly-concurrent collector, running most work alongside the application. It reduced pause times but introduced fragmentation since it didn’t compact the heap. G1 (Java 7-9) brought region-based collection with explicit pause time targets, marking the shift toward predictable performance rather than just throughput.
2018-2025: The Low-Latency Era
ZGC and Shenandoah (Java 11-17) achieved sub-millisecond pauses through concurrent relocation, using colored pointers and Brooks pointers respectively. These collectors handle heaps up to 16TB without breaking the millisecond barrier. Java 21 made ZGC generational by default, and Java 25 brings the same optimization to Shenandoah - combining the efficiency of generational collection with concurrent low-latency operation.
The trend is unmistakable: GC pauses that once measured in seconds now complete in single-digit milliseconds.
Collector Selection Guide
Quick Reference:
| Collector | Best For | Pause Time | Heap Size | Notes |
|---|---|---|---|---|
| Serial | Single-core, embedded | High | <100MB | Default for single-core |
| Parallel | Batch processing, throughput | Medium | 1-8GB | Throughput focused |
| G1 | General purpose (Java 9+ default) | Medium (20-200ms) | 4-16GB | Balanced |
| ZGC | Large heaps, ultra-low latency | Low (<10ms) | 8GB-16TB | Generational by default (Java 24+) |
| Shenandoah | Containers, low latency | Low (<10ms) | 8GB-16TB | Lower overhead than ZGC |
For detailed tuning guidance and JVM flags, see our companion article: JVM Garbage Collectors: The Orchestra of Memory Management.
Part VI: Java 25 - The Latest Chapter
Compact Object Headers (JEP 519)
Java 25 introduces Compact Object Headers as a production-ready feature (experimental in Java 24). The innovation is elegant: compress the 12-byte header down to 8 bytes.
How it works: The JVM overlays the class pointer onto unused bits in the Mark Word. Traditional headers have a separate Mark Word (64 bits) and Class Pointer (32 bits). Compact headers embed a 22-bit class index into the Mark Word itself.
Why 22 bits? Because 2²² = 4,194,304 - enough to address over 4 million unique classes. Even the largest applications rarely exceed this.
Project Lilliput: This feature represents the first deliverable from Project Lilliput, OpenJDK’s initiative to minimize Java object memory overhead. The project roadmap targets 4-byte headers in a future release - halving the size once again. For now, the 12→8 byte reduction is just the beginning.
The Math:
| Component | Traditional | Compact (Java 25) |
|---|---|---|
| Mark Word | 64 bits | 64 bits (with embedded class) |
| Class Pointer | 32 bits | 0 bits (embedded) |
| Total | 96 bits (12 bytes) | 64 bits (8 bytes) |
33% reduction (12→8 bytes). Up to 30% CPU savings in production.
Note that the percentage savings vary by object size: a bare Object with no
fields shrinks from 16 bytes (12 header + 4 padding) to 8 bytes - a 50% reduction.
Objects with fields see proportionally less dramatic savings as the data
payload dominates.
Real-World Impact
Amazon battle-tested Compact Object Headers across hundreds of production microservices:
| Metric | Improvement |
|---|---|
| Heap usage | 22% reduction |
| CPU usage | Up to 30% reduction |
| GC cycles | 15% fewer collections |
| JSON parsing | 10% faster |
The gains are most dramatic for applications with many small objects:
- Spring Boot microservices (thousands of small objects per request)
- In-memory caches (Caffeine, Ehcache)
- JSON/Message processing pipelines
Applications with few, large objects (video processing, image manipulation) see less benefit - the header was already negligible compared to data payload.
Enabling Compact Headers
Just add one JVM flag:
# Java 24 (experimental - unlock needed)
java -XX:+UnlockExperimentalVMOptions -XX:+UseCompactObjectHeaders -jar myapp.jar
# Java 25 (production-ready)
java -XX:+UseCompactObjectHeaders -jar myapp.jar
# Verify it's working
java -XX:+UseCompactObjectHeaders -XX:+PrintFlagsFinal 2>&1 | grep UseCompactObjectHeaders
That’s it. No code changes. No recompilation. No regression testing.
Key Takeaway: Java 25 delivers 33% reduction in object header overhead with a single flag. It’s the most memory-efficient Java release ever - without changing a line of application code.
Part VII: Virtual Threads and Memory — A New Allocation Paradigm
The Million-Thread Challenge
Java 21 introduced Virtual Threads (JEP 444), fundamentally changing how Java applications handle concurrency. Where platform threads map 1:1 to operating system threads (typically limited to thousands), virtual threads are lightweight user-mode threads that the JVM can multiplex onto a small pool of platform threads — enabling millions of concurrent tasks.
But this concurrency revolution comes with a memory management trade-off: virtual thread stack frames (continuations) are heap-allocated when the thread is parked or suspended. When millions of virtual threads are blocked waiting for I/O, their stack state lives on the heap rather than in native memory. This creates a fundamentally different allocation pattern than traditional thread pools.
How Virtual Threads Change Memory Allocation
The Traditional Model:
- Platform threads: 1MB+ stack per thread (off-heap, in native memory)
- Limited to thousands of threads due to OS constraints
- Heap allocation comes primarily from application objects
The Virtual Thread Model:
- Virtual threads: ~1KB initial footprint when idle
- When mounted: runs on carrier thread’s native stack (like platform threads)
- When parked/suspended: continuation stack frames stored on the heap
- Millions of parked threads = millions of heap-resident continuations
- Result: Persistent old-generation pressure rather than transient young-gen spikes
Real-World Findings
Research from InfoQ’s case study (2024) reveals:
-
Memory footprint varies dramatically based on workload patterns. Applications with high virtual thread churn may see increased GC pressure compared to equivalent platform thread implementations due to heap-resident continuations.
-
CPU-intensive workloads may show lower throughput with virtual threads than well-tuned platform thread pools due to increased allocation overhead from managing millions of thread states.
-
I/O-bound workloads benefit most — the blocking operations that make virtual threads shine don’t increase heap pressure proportionally since threads are parked during I/O waits, and the heap storage of continuations is more memory-efficient than native thread stacks.
Tuning for Virtual Threads
# Monitor virtual thread pinning (when a VT cannot be unmounted)
java -Djdk.tracePinnedThreads=short ...
# Adjust for higher allocation rates from virtual thread stacks
java -XX:+UseZGC -XX:MaxGCPauseMillis=5 -Xms4g -Xmx4g MyApp
# Consider larger young generation for VT-heavy workloads with G1
java -XX:+UseG1GC -XX:NewRatio=1 -XX:MaxGCPauseMillis=50 ...
Key Insight: When migrating to virtual threads, re-benchmark your memory allocation and GC behavior. The allocation profile may shift enough to warrant reconsidering your collector choice or tuning parameters. The collector that worked well for your platform-threaded application may not be optimal for the same workload using virtual threads.
Reference: Chirumamilla, P. et al. “Java Virtual Threads: a Case Study.” InfoQ, July 2024. The study found that “memory footprint in Open Liberty deployments can vary greatly based on factors like application design, workload level, and garbage collection behavior, so the reduced footprint of virtual threads may not result in an overall reduction in memory used.”
Going deeper: Part 2 covers Virtual Thread internals including continuation frame capture, mount/unmount mechanics, pinning detection, and detailed heap allocation analysis.
Cross-References and Further Reading
Continue Your Journey
Part 2: Deep Dive - For those who want to understand the bit-level details:
- Stack frame internals:
max_localsandmax_stackcalculation - Mark Word state machines: unlocked → lightweight → heavyweight → GC marked
- Tri-color marking algorithms and SATB barriers
- ZGC colored pointers vs Shenandoah Brooks pointers
- Virtual Threads internals: continuation frame capture, mounting mechanics, pinning
- Project Panama: Foreign Memory API,
MemorySegment,Arena, off-heap strategies - Bytecode analysis with ASM
Related Articles:
- JVM: The Silent Revolution - Java 25’s complete feature set including Project Leyden (AOT profiling) and Scoped Values
- JVM Garbage Collectors Guide - Practical collector selection, tuning, and JVM flags
Summary
By the end of Part 1, you should understand:
- ✓ The Two Realms: Stack (per-thread, automatic, LIFO) vs Heap (shared, GC-managed)
- ✓ Stack Frames:
max_localsandmax_stackdetermined at compile time, frame components - ✓ Object Headers: 12 bytes (Mark Word + Class Pointer), with alignment padding applied before fields
- ✓ Generational GC: Most objects die young, Minor GC (fast) vs Major GC (rare)
- ✓ GC Evolution: From simple STW to sophisticated concurrent collectors
- ✓ Java 25: Compact Object Headers reduce overhead from 12 to 8 bytes (33% reduction)
- ✓ Virtual Threads: Java 21’s lightweight threads store continuations on heap, changing allocation patterns
Once you internalize these fundamentals, you start seeing hidden costs everywhere. A cache with a million entries pays 12MB in header overhead before storing a single byte of data. A microservice processing thousands of requests per second allocates more bytes for headers than for the actual request payload.
This overhead isn’t inevitable. Java 25 cuts it by a third with a single flag - no code changes, no recompilation, no testing cycles. The JVM spent thirty years optimizing garbage collection down to millisecond pauses. Now it’s optimizing the objects themselves.
Add -XX:+UseCompactObjectHeaders and measure the difference.
References
-
Java Virtual Machine Specification
- Oracle. “Chapter 2: The Structure of the Java Virtual Machine.” Java SE Specifications, 2025.
- https://docs.oracle.com/javase/specs/jvms/se25/html/jvms-2.html
-
JEP 519: Compact Object Headers
- Kennke, Roman. “JEP 519: Compact Object Headers.” OpenJDK, 2025.
- https://openjdk.org/jeps/519
-
Java Object Headers Deep Dive
- Xesquevixos, Wanderson. “Mastering Memory Efficiency with Compact Object Headers in JDK 25.” JAVAPRO International, February 2026.
- https://javapro.io/2026/02/10/mastering-memory-efficiency-with-compact-object-headers-in-jdk-25/
-
Java 25 Compact Headers Integration
- InfoQ, June 2025.
- https://www.infoq.com/news/2025/06/java-25-compact-object-headers/
-
Understanding Object Headers
- Shamaii, Kiarash. “Understanding Object Header in Java.” Medium, September 2025.
- https://medium.com/@kiarash.shamaii/understanding-object-header-in-java-175c42329b90
-
GC Evolution from Java 1.0 to 21
- Singh, Bhupendra. “The Evolution of Garbage Collection in Java: From Java 1.0 to Java 21.” Medium, October 2024.
- https://medium.com/@rajbupendra588/the-evolution-of-garbage-collection-in-java-from-java-1-0-to-java-21-47ab179abb7c
-
Java Object Layout (JOL)
- Shipilev, Aleksey. “Java Object Layout.” OpenJDK Project.
- http://openjdk.java.net/projects/code-tools/jol/
-
JVM Internal Architecture
- Zen, Nuwan. “Internal architecture of the Java Virtual Machine (JVM).” Medium, September 2021.
- https://nuwanzen.medium.com/internal-architecture-of-the-java-virtual-machine-jvm-2d3868ff528
Part 1 of the JVM Memory series. Part 2 covers bit-level internals, bytecode analysis, and concurrent GC algorithms.
Related: JVM: The Silent Revolution | JVM Garbage Collectors Guide