JVM Memory Fundamentals: Stack, Heap, and Object Headers

Every Java developer knows objects live on the heap and local variables on the stack. But when you create a simple Point object with two integers, do you know exactly how much memory it consumes? If you guessed “8 bytes for the two ints,” you’re off by 200%.

Your microservice crashes with “only” a million objects. GC pauses freeze traffic at the worst moment. Java 25 promises up to 30% CPU savings and ~22% heap reduction with one flag, but only if you understand what it’s doing. JVM memory fundamentals connect all three.

This is Part 1 of our JVM Memory series. We’ll explore the fundamental concepts every Java developer should know. Part 2 dives into the bit-level details for those who want to understand the JVM as deeply as the CPU understands it.

Part I: The Two Realms - Stack and Heap

The City Beneath Your Code

Imagine your Java application as a bustling metropolis. The Heap is the residential district-sprawling, shared by all citizens, constantly being cleaned by sanitation trucks (the Garbage Collector). The Stack is the business district-each building serves exactly one purpose, is demolished the moment its work is done, and is private to its owner.

Every thread in your application gets its own Stack-vertical, efficient, self-contained. The Heap is the shared commons where all threads store their objects, references crossing between threads like citizens visiting shared facilities.

The Stack: Where Methods Live and Die

The Stack is perhaps the most elegant data structure in computing. Last In, First Out. When you call a method, a stack frame is pushed. When the method returns, it’s popped. No garbage collection, no memory leaks - just automatic, deterministic cleanup.

Each thread gets its own stack, sized at JVM startup (-Xss flag, default 1MB). When a thread exceeds this limit? StackOverflowError. You’ve seen it - infinite recursion, overly deep call chains.

public class StackOverflowDemo {
    public static void main(String[] args) {
        recursiveMethod(1);
    }

    private static void recursiveMethod(int depth) {
        System.out.println("Depth: " + depth);
        recursiveMethod(depth + 1);  // Eventually: StackOverflowError
    }
}

The stack stores primitives and references.

The heap stores everything those references point to.

Simple distinction, massive implications for memory and GC.

The Heap: The Object Repository

If the Stack is where methods execute, the Heap is where objects reside. Every time you write new Object(), the JVM allocates space on the heap. Unlike the stack, heap memory isn’t automatically reclaimed when a method returns - that’s the Garbage Collector’s job.

The heap is shared across all threads. Your main thread creates an object; your worker thread can reference it. This sharing is powerful but requires synchronization when mutable objects are accessed concurrently.

Memory Error Types:

StackOverflowError: Stack too deep (infinite recursion, deep call chains)
OutOfMemoryError: Java heap space: Heap exhausted (memory leaks, insufficient size)
OutOfMemoryError: Metaspace: Class metadata exhausted (too many classes loaded)

Part II: Stack Frames - The Method’s Workspace

Anatomy of a Frame

Every method invocation creates a stack frame - a self-contained execution environment. The frame contains everything the method needs to execute: its local variables, its computation workspace, and information about where to return when done.

The key insight: The compiler (javac) knows exactly how much memory each frame needs before the program even runs.

When javac compiles your source code, it analyzes every method to determine:

max_locals: How many slots the local variable array needs (parameters + local variables, accounting for slot reuse)
max_stack: The deepest the operand stack will grow during execution

These values are stored in the .class file’s Code attribute. The JVM reads them when loading the class, so when a method is called, it knows precisely how much memory to allocate - no guesswork, no waste.

The Local Variable Array

The local variable array is indexed storage for everything local to the method:

Slot 0: this reference (for instance methods only)
Slots 1-N: Method parameters, in declaration order
Subsequent slots: Local variables declared in the method body

Important: long and double occupy two consecutive slots. Everything else (int, Object reference, byte, short, char, boolean, float) occupies one slot.

The compiler is smart about reusing slots. If you have:

public void example() {
    int x = 1;  // Uses slot 1
    // ... code using x ...

    int y = 2;  // Can reuse slot 1 if x is no longer needed
}

javac analyzes variable lifetimes and reuses slots when possible, minimizing max_locals.

The Operand Stack

Bytecode works indirectly. Want to add two numbers? Load them onto the operand stack first. Execute iadd. Store the result back. Local variables provide storage; the operand stack provides the workspace where every instruction executes.

Consider this simple addition:

// Java source:
public int add(int a, int b) {
    return a + b;
}

// Bytecode (simplified):
// 0: iload_1    // Push local variable 1 (a) onto stack
// 1: iload_2    // Push local variable 2 (b) onto stack
// 2: iadd       // Pop two ints, add, push result
// 3: ireturn    // Pop result, return to caller

The operand stack starts empty. iload_1 pushes a. iload_2 pushes b. Now the stack is [a, b] (b on top). iadd pops both, adds them, pushes the result. Stack is now [result]. ireturn pops and returns it.

The compiler simulates this execution during compilation to find the maximum stack depth - the max_stack value stored in the .class file.

Dynamic Linking and Exception Handling

Every frame contains a reference to the runtime constant pool of the current class. When your code calls another method (obj.doSomething()), the bytecode contains a symbolic reference. The JVM resolves this to the actual method address - dynamic linking. Once resolved, the JVM caches the direct reference for subsequent calls.

The exception table maps ranges of bytecode (try blocks) to exception handlers (catch blocks). Each entry contains:

Start PC, End PC: The protected code range
Handler PC: Where to jump if exception occurs
Catch type: The exception class to catch

When an exception is thrown, the JVM searches the current frame’s exception table. If no match, it pops the frame and checks the caller’s table - unwinding the stack until a handler is found or the thread terminates.

Key Takeaway: Every method call creates a frame with dedicated space for data and computation. The compiler determines exact memory requirements (max_locals and max_stack) ahead of time, so the JVM allocates precisely what’s needed.

Part III: Object Headers - The Hidden Cost

The Header Tax

Create a simple object:

class Point {
    int x;
    int y;
}

Expected: 8 bytes
Traditional JVM: 24 bytes

Every object carries 12 bytes of header metadata:

Field	Size	Contents
Mark Word	8 bytes	Hash, lock state, GC age
Class Pointer	4 bytes	Reference to class metadata

Additionally, alignment padding (4 bytes) may be inserted before the first field to ensure 8-byte alignment. The combined header plus padding costs 2× the actual data for small objects.

Java 25 changes the math. By embedding a 22-bit class index into unused Mark Word bits, the JVM compresses the header from 12 bytes to 8 bytes:

Layout	Header Size	Point Object	Savings
Traditional	12 bytes	24 bytes	-
Compact (Java 25)	8 bytes	16 bytes	33%

For arrays, the length field (4 bytes) forces 8-byte alignment regardless, so headers stay at 16 bytes. int[10] remains 56 bytes in both cases - compact headers mainly benefit small objects without array overhead.

Enable with: -XX:+UseCompactObjectHeaders

Seeing It In Action

Let’s use Java Object Layout (JOL) to inspect actual object sizes:

import org.openjdk.jol.info.ClassLayout;
import org.openjdk.jol.vm.VM;

public class ObjectHeaderDemo {
    public static void main(String[] args) {
        System.out.println(VM.current().details());

        // Simple object with two int fields
        class Point {
            int x;
            int y;
        }

        Point p = new Point();
        System.out.println("=== Point Object Layout ===");
        System.out.println(ClassLayout.parseInstance(p).toPrintable());

        // Array header with length field
        int[] intArray = new int[10];
        System.out.println("\n=== int[10] Array Layout ===");
        System.out.println(ClassLayout.parseInstance(intArray).toPrintable());

        // Demonstrate hash code storage
        Object o = new Object();
        System.out.println("\n=== Before hashCode() ===");
        System.out.println(ClassLayout.parseInstance(o).toPrintable());

        int hash = o.hashCode();
        System.out.println("\n=== After hashCode(): " + hash + " ===");
        System.out.println(ClassLayout.parseInstance(o).toPrintable());
    }
}

Expected output analysis:

Point object: 12 bytes header + 4 bytes alignment padding + 8 bytes fields = 24 bytes (traditional), 16 bytes (compact)
Bare Object: 12 bytes header + 4 bytes padding = 16 bytes (traditional), 8 bytes (compact) - the most dramatic single-object reduction
int[10] array: 12 bytes header + 4 bytes length + 40 bytes data = 56 bytes (already 8-byte aligned, no extra padding needed)
Hash code: Notice how the Mark Word changes after hashCode() is called - the hash is computed once and cached in the header

The Mark Word: A Multi-Purpose Chameleon

The 64-bit Mark Word is remarkably versatile. It stores:

Identity hash code (31 bits): Computed lazily on first hashCode() call
GC age (4 bits): Number of GC cycles survived (max 15 before tenuring)
Lock state (3 bits): Unlocked, lightweight locked, heavyweight locked, marked for GC

The same 64 bits serve different purposes depending on the object’s state. When you call hashCode(), bits that were unused suddenly store your hash. When you synchronized(obj), the bits transform to hold lock information. During GC, they become forwarding pointers.

Key Takeaway: Object headers (12 bytes) plus alignment padding create 16 bytes of overhead minimum - often consuming 50-67% of memory for small objects. In applications creating millions of small objects (microservices, caches, JSON parsing), this overhead compounds into gigabytes of wasted RAM.

Part IV: Garbage Collection - The Rhythm of Object Lifetimes

Most Objects Die Young

Garbage collection in the JVM is built on a simple observation:

The vast majority of objects live brief, intense lives.

Your request handlers, your temporary calculation intermediates, your StringBuilder instances - they sparkle into existence and disappear within milliseconds.

This is the Weak Generational Hypothesis, and it shapes every modern JVM garbage collector.

The Heap is divided into generations:

Young Generation: Where objects are born. Contains Eden space (new allocations) and two Survivor spaces (aging objects)
Old Generation (Tenured): Where long-lived objects retire

Young Generation Collections (Minor GC)

When Eden fills up, the JVM triggers a Minor GC:

Stop-the-world pause (brief - typically milliseconds)
Identify live objects: Starting from GC roots (stack references, static fields), traverse the object graph
Copy survivors: Live objects in Eden are copied to Survivor space (S0 or S1)
Empty Eden: Dead objects are simply abandoned - no sweeping needed
Increment age: Objects’ GC age incremented in their Mark Word

The copying collector is efficient because most objects are already dead. If 90% of Eden is garbage, we only copy 10% - much faster than sweeping everything.

The Survivor Spaces Dance

The two Survivor spaces (S0 and S1) alternate roles:

One holds objects from the previous collection
The other is empty, ready to receive new survivors
After each Minor GC, they swap identities

Objects that survive enough Minor GCs (default threshold: 15, stored in 4 Mark Word bits) are tenured - promoted to the Old Generation.

Old Generation Collections (Major/Full GC)

When does Major GC trigger?

Unlike the predictable Eden-based Minor GC, Major GC starts when:

Old Generation fills up - An allocation request in Old Gen cannot be satisfied
Promotion fails - Survivor objects cannot be promoted to Old Gen due to insufficient space
Explicit request - System.gc() (though modern JVMs may ignore this)
Metaspace pressure - Class metadata exhaustion triggers Full GC to unload classes

Major GC is far less predictable than Minor GC. While Eden fills at a steady rate based on allocation velocity, Old Gen consumption depends on object lifetime patterns that vary by workload.

The Old Generation uses different algorithms because:

Objects here are expected to live long
Collection is infrequent but more expensive
Fragmentation is a real concern

Different collectors handle Old Gen differently:

Serial/Parallel: Mark-Sweep-Compact (identify live, remove dead, slide live together)
CMS: Mostly concurrent mark-sweep (no compaction - fragmentation occurs)
G1: Incremental mixed collections (collects some old regions with young)
ZGC/Shenandoah: Concurrent mark-relocate (move objects while app runs)

GC Roots: Where Collection Starts

Garbage collection doesn’t start from random objects - it starts from GC Roots, objects that are definitely alive:

Stack references: Local variables and parameters on all thread stacks
Static fields: Class static variables
JNI references: Native code references
Monitors: Objects used as synchronization monitors

The GC traverses from these roots, following references, marking everything reachable as live. Unmarked objects? Garbage.

Key Takeaway: GC leverages object lifetime patterns - most objects die young, so optimize for that with fast, copying Minor GCs. Long-lived objects graduate to Old Generation, where more sophisticated (but slower) collectors manage them.

Part V: The Evolution of Garbage Collection

From Seconds to Milliseconds

The JVM’s garbage collection story follows a clear trajectory: pause times have dropped from seconds to milliseconds over three decades.

1996-1998: The Foundation

Java 1.0 launched with simple Mark-and-Sweep - basic automatic memory management, but pause times grew linearly with heap size. Java 1.2 introduced Serial and Parallel collectors and established the generational hypothesis: most objects die young, so optimize for that with separate young and old generations. This insight remains fundamental to every modern collector.

2004-2014: Reducing Pauses

CMS (Java 5-7) was the first mostly-concurrent collector, running most work alongside the application. It reduced pause times but introduced fragmentation since it didn’t compact the heap. G1 (Java 7-9) brought region-based collection with explicit pause time targets, marking the shift toward predictable performance rather than just throughput.

2018-2025: The Low-Latency Era

ZGC and Shenandoah (Java 11-17) achieved sub-millisecond pauses through concurrent relocation, using colored pointers and Brooks pointers respectively. These collectors handle heaps up to 16TB without breaking the millisecond barrier. Java 21 made ZGC generational by default, and Java 25 brings the same optimization to Shenandoah - combining the efficiency of generational collection with concurrent low-latency operation.

The trend is unmistakable: GC pauses that once measured in seconds now complete in single-digit milliseconds.

Collector Selection Guide

Quick Reference:

Collector	Best For	Pause Time	Heap Size	Notes
Serial	Single-core, embedded	High	<100MB	Default for single-core
Parallel	Batch processing, throughput	Medium	1-8GB	Throughput focused
G1	General purpose (Java 9+ default)	Medium (20-200ms)	4-16GB	Balanced
ZGC	Large heaps, ultra-low latency	Low (<10ms)	8GB-16TB	Generational by default (Java 24+)
Shenandoah	Containers, low latency	Low (<10ms)	8GB-16TB	Lower overhead than ZGC

For detailed tuning guidance and JVM flags, see our companion article: JVM Garbage Collectors: The Orchestra of Memory Management.

Part VI: Java 25 - The Latest Chapter

Compact Object Headers (JEP 519)

Java 25 introduces Compact Object Headers as a production-ready feature (experimental in Java 24). The innovation is elegant: compress the 12-byte header down to 8 bytes.

How it works: The JVM overlays the class pointer onto unused bits in the Mark Word. Traditional headers have a separate Mark Word (64 bits) and Class Pointer (32 bits). Compact headers embed a 22-bit class index into the Mark Word itself.

Why 22 bits? Because 2²² = 4,194,304 - enough to address over 4 million unique classes. Even the largest applications rarely exceed this.

Project Lilliput: This feature represents the first deliverable from Project Lilliput, OpenJDK’s initiative to minimize Java object memory overhead. The project roadmap targets 4-byte headers in a future release - halving the size once again. For now, the 12→8 byte reduction is just the beginning.

The Math:

Component	Traditional	Compact (Java 25)
Mark Word	64 bits	64 bits (with embedded class)
Class Pointer	32 bits	0 bits (embedded)
Total	96 bits (12 bytes)	64 bits (8 bytes)

33% reduction (12→8 bytes). Up to 30% CPU savings in production.

Note that the percentage savings vary by object size: a bare Object with no fields shrinks from 16 bytes (12 header + 4 padding) to 8 bytes - a 50% reduction. Objects with fields see proportionally less dramatic savings as the data payload dominates.

Real-World Impact

Amazon battle-tested Compact Object Headers across hundreds of production microservices:

Metric	Improvement
Heap usage	22% reduction
CPU usage	Up to 30% reduction
GC cycles	15% fewer collections
JSON parsing	10% faster

The gains are most dramatic for applications with many small objects:

Spring Boot microservices (thousands of small objects per request)
In-memory caches (Caffeine, Ehcache)
JSON/Message processing pipelines

Applications with few, large objects (video processing, image manipulation) see less benefit - the header was already negligible compared to data payload.

Enabling Compact Headers

Just add one JVM flag:

# Java 24 (experimental  -  unlock needed)
java -XX:+UnlockExperimentalVMOptions -XX:+UseCompactObjectHeaders -jar myapp.jar

# Java 25 (production-ready)
java -XX:+UseCompactObjectHeaders -jar myapp.jar

# Verify it's working
java -XX:+UseCompactObjectHeaders -XX:+PrintFlagsFinal 2>&1 | grep UseCompactObjectHeaders

That’s it. No code changes. No recompilation. No regression testing.

Key Takeaway: Java 25 delivers 33% reduction in object header overhead with a single flag. It’s the most memory-efficient Java release ever - without changing a line of application code.

Part VII: Virtual Threads and Memory — A New Allocation Paradigm

The Million-Thread Challenge

Java 21 introduced Virtual Threads (JEP 444), fundamentally changing how Java applications handle concurrency. Where platform threads map 1:1 to operating system threads (typically limited to thousands), virtual threads are lightweight user-mode threads that the JVM can multiplex onto a small pool of platform threads — enabling millions of concurrent tasks.

But this concurrency revolution comes with a memory management trade-off: virtual thread stack frames (continuations) are heap-allocated when the thread is parked or suspended. When millions of virtual threads are blocked waiting for I/O, their stack state lives on the heap rather than in native memory. This creates a fundamentally different allocation pattern than traditional thread pools.

How Virtual Threads Change Memory Allocation

The Traditional Model:

Platform threads: 1MB+ stack per thread (off-heap, in native memory)
Limited to thousands of threads due to OS constraints
Heap allocation comes primarily from application objects

The Virtual Thread Model:

Virtual threads: ~1KB initial footprint when idle
When mounted: runs on carrier thread’s native stack (like platform threads)
When parked/suspended: continuation stack frames stored on the heap
Millions of parked threads = millions of heap-resident continuations
Result: Persistent old-generation pressure rather than transient young-gen spikes

Real-World Findings

Research from InfoQ’s case study (2024) reveals:

Memory footprint varies dramatically based on workload patterns. Applications with high virtual thread churn may see increased GC pressure compared to equivalent platform thread implementations due to heap-resident continuations.
CPU-intensive workloads may show lower throughput with virtual threads than well-tuned platform thread pools due to increased allocation overhead from managing millions of thread states.
I/O-bound workloads benefit most — the blocking operations that make virtual threads shine don’t increase heap pressure proportionally since threads are parked during I/O waits, and the heap storage of continuations is more memory-efficient than native thread stacks.

Tuning for Virtual Threads

# Monitor virtual thread pinning (when a VT cannot be unmounted)
java -Djdk.tracePinnedThreads=short ...

# Adjust for higher allocation rates from virtual thread stacks
java -XX:+UseZGC -XX:MaxGCPauseMillis=5 -Xms4g -Xmx4g MyApp

# Consider larger young generation for VT-heavy workloads with G1
java -XX:+UseG1GC -XX:NewRatio=1 -XX:MaxGCPauseMillis=50 ...

Key Insight: When migrating to virtual threads, re-benchmark your memory allocation and GC behavior. The allocation profile may shift enough to warrant reconsidering your collector choice or tuning parameters. The collector that worked well for your platform-threaded application may not be optimal for the same workload using virtual threads.

Reference: Chirumamilla, P. et al. “Java Virtual Threads: a Case Study.” InfoQ, July 2024. The study found that “memory footprint in Open Liberty deployments can vary greatly based on factors like application design, workload level, and garbage collection behavior, so the reduced footprint of virtual threads may not result in an overall reduction in memory used.”

Going deeper: Part 2 covers Virtual Thread internals including continuation frame capture, mount/unmount mechanics, pinning detection, and detailed heap allocation analysis.

Cross-References and Further Reading

Continue Your Journey

Part 2: Deep Dive - For those who want to understand the bit-level details:

Stack frame internals: max_locals and max_stack calculation
Mark Word state machines: unlocked → lightweight → heavyweight → GC marked
Tri-color marking algorithms and SATB barriers
ZGC colored pointers vs Shenandoah Brooks pointers
Virtual Threads internals: continuation frame capture, mounting mechanics, pinning
Project Panama: Foreign Memory API, MemorySegment, Arena, off-heap strategies
Bytecode analysis with ASM

Related Articles:

JVM: The Silent Revolution - Java 25’s complete feature set including Project Leyden (AOT profiling) and Scoped Values
JVM Garbage Collectors Guide - Practical collector selection, tuning, and JVM flags

Summary

By the end of Part 1, you should understand:

✓ The Two Realms: Stack (per-thread, automatic, LIFO) vs Heap (shared, GC-managed)
✓ Stack Frames: max_locals and max_stack determined at compile time, frame components
✓ Object Headers: 12 bytes (Mark Word + Class Pointer), with alignment padding applied before fields
✓ Generational GC: Most objects die young, Minor GC (fast) vs Major GC (rare)
✓ GC Evolution: From simple STW to sophisticated concurrent collectors
✓ Java 25: Compact Object Headers reduce overhead from 12 to 8 bytes (33% reduction)
✓ Virtual Threads: Java 21’s lightweight threads store continuations on heap, changing allocation patterns

Once you internalize these fundamentals, you start seeing hidden costs everywhere. A cache with a million entries pays 12MB in header overhead before storing a single byte of data. A microservice processing thousands of requests per second allocates more bytes for headers than for the actual request payload.

This overhead isn’t inevitable. Java 25 cuts it by a third with a single flag - no code changes, no recompilation, no testing cycles. The JVM spent thirty years optimizing garbage collection down to millisecond pauses. Now it’s optimizing the objects themselves.

Add -XX:+UseCompactObjectHeaders and measure the difference.

References

Java Virtual Machine Specification
- Oracle. “Chapter 2: The Structure of the Java Virtual Machine.” Java SE Specifications, 2025.
- https://docs.oracle.com/javase/specs/jvms/se25/html/jvms-2.html
JEP 519: Compact Object Headers
- Kennke, Roman. “JEP 519: Compact Object Headers.” OpenJDK, 2025.
- https://openjdk.org/jeps/519
Java Object Headers Deep Dive
- Xesquevixos, Wanderson. “Mastering Memory Efficiency with Compact Object Headers in JDK 25.” JAVAPRO International, February 2026.
- https://javapro.io/2026/02/10/mastering-memory-efficiency-with-compact-object-headers-in-jdk-25/
Java 25 Compact Headers Integration
- InfoQ, June 2025.
- https://www.infoq.com/news/2025/06/java-25-compact-object-headers/
Understanding Object Headers
- Shamaii, Kiarash. “Understanding Object Header in Java.” Medium, September 2025.
- https://medium.com/@kiarash.shamaii/understanding-object-header-in-java-175c42329b90
GC Evolution from Java 1.0 to 21
- Singh, Bhupendra. “The Evolution of Garbage Collection in Java: From Java 1.0 to Java 21.” Medium, October 2024.
- https://medium.com/@rajbupendra588/the-evolution-of-garbage-collection-in-java-from-java-1-0-to-java-21-47ab179abb7c
Java Object Layout (JOL)
- Shipilev, Aleksey. “Java Object Layout.” OpenJDK Project.
- http://openjdk.java.net/projects/code-tools/jol/
JVM Internal Architecture
- Zen, Nuwan. “Internal architecture of the Java Virtual Machine (JVM).” Medium, September 2021.
- https://nuwanzen.medium.com/internal-architecture-of-the-java-virtual-machine-jvm-2d3868ff528

Part 1 of the JVM Memory series. Part 2 covers bit-level internals, bytecode analysis, and concurrent GC algorithms.
Related: JVM: The Silent Revolution | JVM Garbage Collectors Guide