Using Open-Source Bytecode Instrumentation Profiler with Tomcat in Docker

Profiling a Tomcat-based Java 11 application in production requires an unobtrusive, low-overhead tool that can pinpoint CPU and memory hotspots at method or line granularity. Open-source profilers like Async-Profiler, Glowroot, Arthas, and Apache SkyWalking fit this bill. These tools typically work via a Java agent (bytecode instrumentation) or native sampling, allowing continuous profiling with minimal impact. For example, Async-Profiler is a “low overhead sampling profiler” that captures CPU stacks and heap allocation samples skywalking.apache.org github.com. Glowroot is an Apache-licensed Java APM agent noted for being “easy to use, very low overhead“ github.com glowroot.org. SkyWalking provides a full APM/tracing suite (also open-source) and can integrate Async-Profiler for in-depth CPU/memory analysis skywalking.apache.org. Alibaba’s Arthas is a production debugger that offers an on-demand profiler command (based on Async-Profiler) to generate flame graphs github.com arthas.aliyun.com.

Each tool has trade-offs. Async-Profiler gives raw flamegraphs (CPU/allocations) with virtually no safepoint bias skywalking.apache.org, but has no UI and requires Linux perf support. Glowroot provides a built-in web UI with metric charts and trace breakdowns (default port 4000) github.com github.com, but adds a permanent agent overhead (though it claims microsecond-level impact glowroot.org). SkyWalking offers distributed tracing and centralized analysis, but requires running its OAP server and UI. Arthas can attach to a running JVM without restart, but is meant for ad-hoc diagnostics rather than always-on monitoring (though it can output profiling results in HTML format arthas.aliyun.com). Below we focus on Async-Profiler and Glowroot as representative examples, while noting how others could be used.

Table of Contents

Choosing an Open-Source Profiler

Async-Profiler (GPL) – A native sampling profiler that uses perf_event_open. Pros: very low overhead, unbiased sampling, supports CPU, heap allocation, and lock profiling skywalking.apache.org github.com. It produces interactive flame graphs (HTML/SVG) that highlight hot call paths. Cons: no built-in GUI, requires special privileges and kernel settings in Docker, and only runs on Linux (plus macOS).
Glowroot (Apache 2.0) – A Java agent APM with a web UI. Pros: easy setup (just unzip and add javaagent:glowroot.jar github.com), continuous monitoring of transactions/JVM/memory, method-level tracing, and very low claimed overhead glowroot.org. It collects response-time breakdowns and thread profiling on demand. Cons: primarily single-node (though a central collector exists), slightly higher overhead than pure sampling (but still small), and requires exposing a web port (default 4000/4040).
Apache SkyWalking (Apache 2.0) – A full-stack APM with distributed tracing and metrics. Pros: holistic view, built-in UI, can trigger Async-Profiler tasks (since v10.2) to analyze CPU and memory allocation profiles skywalking.apache.org. Cons: heavy infrastructure (OAP server, storage), more complex to set up, and higher overall overhead.
Alibaba Arthas (Apache 2.0) – A live diagnostic tool. Pros: attaches without restart, supports a profiler command that outputs flame graph HTML (it embeds async-profiler) arthas.aliyun.com arthas.aliyun.com. Cons: interactive (no always-on mode), profiling must be triggered manually, and it’s primarily CLI-based.
Java Flight Recorder (JFR) – Built into OpenJDK 11+. Pros: very low overhead, continuous recording of CPU, allocations, locks, etc. Cons: typically requires Oracle/OpenJDK distribution (but OpenJDK now includes JFR), output is binary (analysis via Mission Control or converters). It is not “bytecode instrumentation” per se, but is a viable open solution for production profiling.

In practice, Async-Profiler and Glowroot are popular because they balance ease of use with detail. SkyWalking/Pinpoint are better for distributed apps, while Arthas is great for quick on-the-fly diagnosis.

Setting Up Dockerized Tomcat

Assume you have a Docker image based on an official Tomcat 9/10 image with Java 11 (e.g. tomcat:9-jdk11). To add a profiler, you typically modify the Dockerfile to download the profiler agent and set JVM options. The Tomcat entrypoint respects the JAVA_OPTS (or CATALINA_OPTS) environment variable, so you can append agent flags there.

Example: Adding Glowroot APM Agent

Glowroot ships as a .jar agent and some config files. In the Dockerfile, download and unzip the Glowroot distribution into /opt/glowroot, and set -javaagent to the Glowroot JAR. Also include an admin.json to allow remote UI access. For example:

FROM tomcat:9-jdk11
ENV JAVA_OPTS="-javaagent:/opt/glowroot/glowroot.jar"
# (Optional) Add any Tomcat libs or conf here
COPY lib/ /usr/local/tomcat/lib/
COPY conf/ /usr/local/tomcat/conf/

# Download and install Glowroot APM
RUN wget <https://github.com/glowroot/glowroot/releases/download/v0.14.0/glowroot-0.14.0-dist.zip> \\
    -O /tmp/glowroot.zip \\
 && unzip /tmp/glowroot.zip -d /tmp/ \\
 && mv /tmp/glowroot /opt/glowroot

# Copy admin.json to allow UI access from any host (bindAddress 0.0.0.0)
COPY glowroot/admin.json /opt/glowroot/

The admin.json might look like:

{
  "web": {
    "bindAddress": "0.0.0.0",
    "port": "4000"
  }
}

This lets you browse to http://<container-host>:4000 to use Glowroot’s UI github.com. We cited an example Dockerfile above from a community repo github.com. After building this image, run the container normally (e.g. docker run -d -p 8080:8080 -p 4000:4000 my-tomcat-glowroot). Glowroot will start with Tomcat and listen on port 4000 by default.

Example: Adding Async-Profiler

Async-Profiler is a native library, so you must include its binaries in the image. For a Debian-based Tomcat image, you could apt-get or simply download the tarball. In Dockerfile:

FROM tomcat:9-jdk11
# Install necessary packages and Async-Profiler
RUN apt-get update && apt-get install -y wget libstdc++ \\
 && wget <https://github.com/jvm-profiling-tools/async-profiler/releases/download/v4.0/async-profiler-4.0-linux-x64.tar.gz> \\
    -O /tmp/async-profiler.tar.gz \\
 && tar -xzf /tmp/async-profiler.tar.gz -C /opt \\
 && mv /opt/async-profiler-4.0-linux-x64 /opt/async-profiler

# Set JVM to load async-profiler agent on startup (start with CPU profiling, output to /tmp/profile.html)
ENV JAVA_OPTS="-agentpath:/opt/async-profiler/lib/libasyncProfiler.so=start,event=cpu,file=/tmp/profile.html"

This adds the libasyncProfiler.so agent library and tells the JVM to start profiling CPU when it launches, writing results to /tmp/profile.html bell-sw.com. (Adjust the event=... and other flags as needed; see the Async-Profiler docs.)

Running the Container

For Async-Profiler to work, the Docker runtime needs elevated privileges because of the Linux perf_event_open syscall. For example:

docker run -d \\
  --name tomcat-prof \\
  -p 8080:8080 \\
  --cap-add=SYS_ADMIN \\
  --security-opt=seccomp=unconfined \\
  my-tomcat-image

This --cap-add=SYS_ADMIN and --security-opt=seccomp=unconfined is often required to allow perf events inside Docker bell-sw.com. You may also need on the host:

sudo sysctl kernel.perf_event_paranoid=1
sudo sysctl kernel.kptr_restrict=0

to permit non-root access to performance counters bell-sw.com. Without these, async-profiler may fail or produce partial results. In contrast, Glowroot’s Java agent does not require special privileges (it uses Java instrumentation only), so a normal docker run is fine for Glowroot.

Instrumenting the JVM

Most of these profilers work by attaching a Java agent at JVM startup (via -javaagent or -agentpath). This instruments the bytecode of loaded classes to capture metrics. For example:

Glowroot: javaagent:/opt/glowroot/glowroot.jar. No code changes are needed; the agent automatically instruments classes (using bytecode libraries) and records transaction traces and CPU profiles github.com github.com.
SkyWalking Agent: If using SkyWalking, add javaagent:path/to/skywalking-agent.jar and configure the agent.config to point to the OAP server. The agent auto-instruments supported frameworks (Servlets, JDBC, etc.). SkyWalking can be set to trigger async-profiler tasks, which it does internally skywalking.apache.org.
Async-Profiler: Uses agentpath:/.../libasyncProfiler.so. This uses JVMTI and perf_events, so it doesn’t do “bytecode weaving” in the same way, but it attaches to the JVM’s profiling API.
Arthas: Can be run as a Java agent (javaagent:arthas-agent.jar) or attached at runtime. Its profiler command (e.g. profiler start/stop) internally uses async-profiler to instrument and sample the JVM arthas.aliyun.com.

In Docker, you typically modify the ENTRYPOINT or CMD to include these options, or set environment vars like above. For Tomcat, using ENV JAVA_OPTS (as shown) or editing catalina.sh is common.

Collecting and Viewing Profile Data

Async-Profiler Output

Async-Profiler outputs sampling data (flame graphs). By default we directed it to /tmp/profile.html. When the container stops, this file will exist in the container’s /tmp. You can copy it out with docker cp:

# Stop the container (profile stops and file is flushed)
docker stop tomcat-prof
# Copy the flamegraph HTML to host
docker cp tomcat-prof:/tmp/profile.html .

Alternatively, you can profile interactively without stopping Tomcat: exec into the container and use asprof commands (a script provided with Async-Profiler). For examplebell-sw.com:

docker exec -it tomcat-prof sh
# Inside container shell, find the Java PID (usually PID 1)
ps ax
# e.g. PID=1
asprof dump -f /tmp/profile.html 1
exit
docker cp tomcat-prof:/tmp/profile.html .

This “dumps” the current profile to HTML without shutting down Tomcat bell-sw.com. The resulting profile.html is an interactive flame graph. Open it in a browser to see call stacks with hot paths. (Async-Profiler can also output collapsed stacks or SVG by changing file extension or using -f flamegraph.svg github.com.) For example, running something like

asprof -d 30 -f flamegraph.html <PID>

for 30 seconds yields flamegraph.html you can browse github.com.

In the flame graph, wide bars are hot methods (aggregate CPU time or allocations). You can click on blocks to drill down or search for method names.

Glowroot Output

Glowroot provides a web UI instead of raw files. After the agent starts, point your browser to http://<host-ip>:4000 (or the port you configured). You will see dashboards showing JVM stats (heap usage, GC, threads) and Transaction traces. You can navigate to “Transactions” or “Trace Viewer” to see slow requests broken down by method. The “Profiler” or “Threads” sections can show sampled stack traces and flame graphs over selected time windows. Glowroot automatically records traces and can show the total time spent in each method of a transaction. It also has an Export button to download flame graph SVGs for a captured trace. Check this post to export all the data from glowroot.

(Glowroot automatically bundles a UI and does not require external setup. The example Dockerfile above github.com installed it on port 4000/4040, so ensure that port is exposed in Docker or your Kubernetes service. Logins can be added if needed.)

Arthas Profiling Output

If you attach Arthas (e.g. via java -jar arthas-boot.jar), you can run profiler start <pid> and profiler stop commands. By default Arthas writes an HTML flame graph file (format “flamegraph”) arthas.aliyun.com. For example:

# In arthas shell
$ profiler start
# [run workload...]
$ profiler stop
# Output in /arthas/result/

This produces an HTML flame graph similar to Async-Profiler’s. You can configure output formats (flat, collapsed, jfr) with profiler -o.

Example Dockerfile Snippets

Below are example modifications to a Tomcat Dockerfile for the profilers discussed:

# 1. Glowroot example
FROM tomcat:9-jdk11
ENV JAVA_OPTS="-javaagent:/opt/glowroot/glowroot.jar"
RUN wget <https://github.com/glowroot/glowroot/releases/download/v0.14.0/glowroot-0.14.0-dist.zip> \\
    -O /tmp/glowroot.zip && \\
    unzip /tmp/glowroot.zip -d /opt && mv /opt/glowroot /opt/glowroot
COPY glowroot/admin.json /opt/glowroot/
# (Add WAR files to webapps, etc.)

(This follows the pattern from a community example github.com.)

# 2. Async-Profiler example (Debian-based Tomcat)
FROM tomcat:9-jdk11
RUN apt-get update && apt-get install -y wget libstdc++
RUN wget <https://github.com/jvm-profiling-tools/async-profiler/releases/download/v4.0/async-profiler-4.0-linux-x64.tar.gz> \\
    -O /tmp/async-profiler.tgz && \\
    tar -xzf /tmp/async-profiler.tgz -C /opt && mv /opt/async-profiler-4.0-linux-x64 /opt/async-profiler
ENV JAVA_OPTS="-agentpath:/opt/async-profiler/lib/libasyncProfiler.so=start,event=cpu,file=/tmp/profile.html"
# (Copy WARs and other configs as needed)

Each snippet uses -javaagent or -agentpath in JAVA_OPTS. You could also append these flags in a custom ENTRYPOINT script. Ensure any JSON config (like Glowroot’s admin.json) is copied so that the agent can start correctly.

Best Practices for Production Profiling

Sampling, Not Instrumentation: Prefer sampling profilers (like Async-Profiler) that interrupt threads periodically, rather than instrumenting every method call, to keep overhead very low. For memory analysis, only enable allocation profiling when needed, since tracking every allocation can double memory overhead.
Minimal Overhead Settings: Use conservative sampling settings (e.g. 10ms interval, or use event=itimer if perf syscall is restricted). Glowroot and Async-Profiler are tuned for microsecond-level overhead glowroot.org skywalking.apache.org, but any profiling adds some cost.
Controlled Duration: Instead of continuous profiling 100% of the time, consider periodic snapshots or on-demand triggers. For example, run asprof for 30–60 seconds during suspected slow periods github.com. Continuous profiling (running all the time) can accumulate overhead atlassian.com, so schedule it (e.g. every 5 minutes for 10s) or trigger it based on alerts.
Resource Limits: Profiling can generate large outputs. Store flame graphs or JFR files outside the container (e.g. mount a volume for /tmp or configure S3 upload) to avoid filling up disk.
Security: Do not expose profiler UIs or agent endpoints publicly without authentication. For Glowroot or SkyWalking UIs, bind to localhost or secure them. In Docker, use network policies to restrict access to the management port.
Privilege Management: Only use -cap-add=SYS_ADMIN --security-opt seccomp=unconfined on trusted nodes. These relax container isolation. Alternatively, use host-based profiling (mount the profiler into the container and run it from the host) to avoid giving privileges to the production container bell-sw.com.
Test in Staging: Always measure profiler overhead in a staging environment first. Check that jstat or similar shows normal GC/throughput when profiling. (Glowroot’s docs note its overhead is negligible in most cases glowroot.org, but it’s best to verify with your workload.)
Monitoring Overhead: Keep an eye on the profiler’s own resource use. For example, Async-Profiler allocates a call stack buffer; ensure your container has enough CPU (profiling is not CPU-free) and memory for that.
Use Latest Versions: Newer versions of profilers often fix bugs (e.g. async-profiler now supports JFR format and better Linux kernels skywalking.apache.org).
Automate Collection: In production, you might script the profiling: e.g. have a cron job that execs asprof and copies out the result at low traffic periods. Or use a CI/CD pipeline to incorporate profiling (for example, SkyWalking tasks can be triggered via API).
Interpret Carefully: Remember that profiling samples are statistical. A flame graph shows where time is spent proportionally. Short-lived spikes or background threads may not appear. Always correlate with logs, metrics, and multiple runs.

In summary, to profile a Dockerized Tomcat in production, install the chosen profiler agent in the image (as shown above), run the container with appropriate flags, and then collect profiles via flame graphs or a web UI. Async-Profiler and Glowroot are especially popular choices: async-profiler for low-level CPU/alloc sampling skywalking.apache.org github.com, and Glowroot for continuous monitoring with an easy UI github.com glowroot.org. Both can pinpoint slow methods and memory hotspots at the code level. Proper configuration (privileges, sampling interval, secure UI) will keep overhead minimal while giving you the deep insights needed to optimize performance and troubleshoot bottlenecks.

References: We’ve drawn on official docs and examples for these tools, such as the Async-Profiler README github.com github.com, BellSoft’s guide to profiling in Docker bell-sw.com bell-sw.com, Glowroot’s installation notes github.com github.com, and the SkyWalking documentation on async-profiler skywalking.apache.org skywalking.apache.org, as well as community guidance on instrumentation and best practices. These resources can be consulted for further details.