OpenTelemetry Java Instrumentation Metadata Project
Table of Contents
A new elephant
There’s something about large, long term projects that I enjoy. Thinking about something regularly for months allows for several evolutions to occur around the understanding of the problem and potential solutions. I’m in the middle of one such project right now, where I am working on building a system for tracking metadata associated with the 250+ instrumentations in the OpenTelemetry java instrumentation codebase. As I work through this problem, I’m going to write some of these blog posts to serve as snapshots of the various stages of my personal evolutions with this problem as well as the progress of the work.
The end goal is to have much better documentation around what users can expect from using the java instrumentation project. Right now, it can be ambiguous about what will be emitted from the usage of instrumentation libraries or the java agent. While many of the modules do have some sort of readme, they aren’t very thorough or consistent. So after a couple discussions, I ended up opening an issue: #13468 - Instrumentation Metadata System.
The issue was opened on March 6, 2025, about a month after I had started hacking on some ideas. At the time, the repo had 250 individual instrumentations, 242 javaagent instrumentations - 15 of which had readmes, and 59 library instrumentations - 35 of which had readmes. The readmes were mostly generic instructions for configuration options and how to instantiate the libraries.
I am now about 3 months into the project, and recently hit some exciting milestones, so it feels like a good point to pause, review what’s been done, what’s next, and spend a little time sketching out some thoughts for the final state of things to work towards.
As I work through this long project, I’m reminded of the old proverb:
How do you eat an elephant? One bite at a time!
What data is relevant?
So when thinking about an individual instrumentation, what is it? What are the properties that would be useful to have available, and how can we identify that information for the ~250 modules we have in place today?
For the first pass at brainstorming this, I considered:
- Some description of what an instrumentation does and what telemetry is produced
- Some form of classification of the instrumentation
- library - instrumentation for particular libraries (aws, clickhouse etc)
- internal - instrumentation used internally within the agent (opentelemetry api helpers)
- custom - instrumentation associated with supporting or generating custom instrumentation (annotations, methods)
- A breakdown of the library versions that are supported, broken down by javaagent vs library support
- Whether the instrumentation is enabled or disabled by default, and how to enable/disable it
- The configuration options available, what they do, and their default values
- Scope information - name, schemaUrl, attributes
- The Semantic conventions the instrumentation follows
- Detailed telemetry information
- What kind of spans are generated, and the various attributes/semantic conventions
- The metrics generated and their attributes/semantic conventions
In a future post I can talk a bit more about why each of these pieces of information would be useful, but let’s use these as a starting point. This set of information would provide a heck of a lot more information than is currently easily available without having to dig through code.
The next step was to start thinking about how to obtain all of this information.
Gatherers
There were some “easy” aspects of this project that I decided to tackle first, which would set the baseline “template” for the work that followed. There was some data that I was confident we could gather through automation in some way, and others that we would need to handle manually. So I started building out various “gatherers” to solve for various data sources or ways to obtain the information.
Baseline information - Instrumentation list and targets
I broke ground by first writing code that would parse the file structure in the instrumentation directory to identify each instrumentation, which “group” or “namespace” they belong to, and extract information about whether it has library or javaagent support. This is done by making some assumptions based on the directory paths and structure of each module.
For example, analyzing these paths:
├── instrumentation
│ ├── clickhouse-client-05
│ ├── jaxrs
│ │ ├── jaxrs-1.0
│ │ ├── jaxrs-2.0
│ ├── spring
│ │ ├── spring-cloud-gateway
│ │ │ ├── spring-cloud-gateway-2.0
│ │ │ ├── spring-cloud-gateway-2.2
│ │ │ └── spring-cloud-gateway-common
Results in the following:
- Name - the full name of the instrumentation module
clickhouse-client-05
,jaxrs-1.0
,spring-cloud-gateway-2.0
- Namespace - the direct parent of the instrumentation module. if there is none, use the module name and strip the
version
clickhouse-client
,jaxrs
,spring-cloud-gateway
- Group - the top most parent in the directory structure, which is used to group instrumentations together
clickhouse-client
,jaxrs
,spring
Slightly more involved was parsing gradle files to extract muzzle and dependency configurations in order to try and identify which versions of the instrumented libraries are supported.
The system then outputs results to a new documentation file in yaml format at docs/instrumentation-list.yaml
, and the
initial implementation looked something like:
activej:
instrumentations:
- name: activej-http-6.0
srcPath: instrumentation/activej-http-6.0
target_versions:
javaagent:
- io.activej:activej-http:[6.0,)
akka:
instrumentations:
- name: akka-http-10.0
srcPath: instrumentation/akka/akka-http-10.0
target_versions:
javaagent:
- com.typesafe.akka:akka-http_2.12:[10,)
- com.typesafe.akka:akka-http_2.13:[10,)
- com.typesafe.akka:akka-http_2.11:[10,)
This is the PR where this was implemented, for anyone interested.
Metadata.yaml files
It became clear that there were going to be some aspects of this information that couldn’t be inferred by automation
alone, at least at this point in time. Therefore, I introduced metadata.yaml
files that could be created within
the directory of each module, and provide a way to add and augment the information that we gather with automation.
After a few iterations of this, the file structure currently supports a handful of possible attributes:
Example metadata.yaml
:
description: "Instruments the x library and provides y"
disabled_by_default: true
classification: internal
configurations:
- name: otel.instrumentation.common.db-statement-sanitizer.enabled
description: Enables statement sanitization for database queries.
type: boolean
default: true
See the docs for the latest schema.
This is the initial implementation PR, although it has evolved a bit since then.
Metric Interceptors
Identifying the metrics and the associated attributes emitted from an instrumentation can be tricky, and I iterated
through several approaches including some static code analysis and manually reviewing the code of each module. In the
end, I decided to try an experiment where I could leverage some of our integration tests where the instrumentation is
exercised, to intercept any metrics emitted and analyze them. This ended up working pretty well. I added some
configuration flags to the Agent test runners to keep track of all metrics that go through some assertion flow, and
at the end of the test run, to write this data into yaml files within a .telemetry
directory in each
instrumentation module. Then, I scrape those files and add the information to our instrumentation model.
Example .telemetry
file:
metrics:
- name: db.client.connections.usage
description: The number of connections that are currently in state described by the state attribute.
type: LONG_SUM
unit: connections
attributes:
- name: pool.name
type: STRING
- name: state
type: STRING
- name: db.client.connections.max
description: The maximum number of open connections allowed.
type: LONG_SUM
unit: connections
attributes:
- name: pool.name
type: STRING
- name: db.client.connections.idle.min
description: The minimum number of idle open connections allowed.
type: LONG_SUM
unit: connections
attributes:
- name: pool.name
type: STRING
- name: db.client.connections.idle.max
description: The maximum number of idle open connections allowed.
type: LONG_SUM
unit: connections
attributes:
- name: pool.name
type: STRING
This is the PR where this was initially implemented.
Output
The current state of the output of this system is getting pretty close to being useful.
Some example snippets from the current output:
alibaba:
- name: alibaba-druid-1.0
description: |
The Alibaba Druid instrumentation generates database connection pool metrics for druid data sources.
source_path: instrumentation/alibaba-druid-1.0
scope:
name: io.opentelemetry.alibaba-druid-1.0
target_versions:
javaagent:
- com.alibaba:druid:(,)
library:
- com.alibaba:druid:1.0.0
metrics:
- name: db.client.connections.usage
description: The number of connections that are currently in state described
by the state attribute.
type: LONG_SUM
unit: connections
attributes:
- name: pool.name
type: STRING
- name: state
type: STRING
- name: db.client.connections.pending_requests
description: The number of pending requests for an open connection, cumulative
for the entire pool.
type: LONG_SUM
unit: requests
attributes:
- name: pool.name
type: STRING
- name: db.client.connections.max
description: The maximum number of open connections allowed.
type: LONG_SUM
unit: connections
attributes:
- name: pool.name
type: STRING
- name: db.client.connections.idle.min
description: The minimum number of idle open connections allowed.
type: LONG_SUM
unit: connections
attributes:
- name: pool.name
type: STRING
- name: db.client.connections.idle.max
description: The maximum number of idle open connections allowed.
type: LONG_SUM
unit: connections
attributes:
- name: pool.name
type: STRING
apache:
- name: apache-dubbo-2.7
description: The Apache Dubbo instrumentation provides client and server spans
for Apache Dubbo RPC calls. Each call produces a span named after the Dubbo
method, enriched with standard RPC attributes (system, service, method), network
attributes, and error details if an exception occurs.
source_path: instrumentation/apache-dubbo-2.7
scope:
name: io.opentelemetry.apache-dubbo-2.7
target_versions:
javaagent:
- org.apache.dubbo:dubbo:[2.7,)
configurations:
- name: otel.instrumentation.common.peer-service-mapping
description: Used to specify a mapping from host names or IP addresses to peer
services.
type: map
default: ''
Vision
There are all kinds of ideas of ways to leverage this information once it is more complete. At some point, the plan is to be able to generate documentation within the repo, generating individual readme files for each library with more consistency around formatting and contents, as well as replacing some of the manually maintained lists of supported libraries.
It will also be useful to be able to diff the contents of the metadata between releases, to understand what telemetry was emitted, or configurations were supported for a specific version of the opentelemetry java agent or libraries. Or to identify drift and ensure changes are intentional and documented.
At some point we would also like to automate this information being published in the opentelemetry.io registry.
Current Progress
As of today (June 6, 2025), this the state of the java agent code base, including some stats on how many modules have metadata.yaml files with some contents around descriptions/configurations:
- Total Modules: 257
- By classification:
- library: 230
- custom: 5
- internal: 22
- metadata.yaml contents:
- descriptions: 27 (10.51%)
- configurations: 19 (7.39%)
Next steps
This post doesn’t cover everything, as the project is fairly large and I’m antsy to get back into the weeds, but up to date progress and notes are available in the github issue.
As of now, I am working on:
- Reviewing more modules in order to manually create metadata.yaml files with configuration options and descriptions (help wanted!).
- Experimenting with how to convey telemetry emitted by default vs emitted when certain configuration options are enabled.
- Span interceptor in the test runner, with a way to identify and present the useful information collected.
To be continued!
See Also: