OpenTelemetry Java Instrumentation Metadata Project

Table of Contents

A new elephant

There’s something about large, long term projects that I enjoy. Thinking about something regularly for months allows for several evolutions to occur around the understanding of the problem and potential solutions. I’m in the middle of one such project right now, where I am working on building a system for tracking metadata associated with the 250+ instrumentations in the OpenTelemetry java instrumentation codebase. As I work through this problem, I’m going to write some of these blog posts to serve as snapshots of the various stages of my personal evolutions with this problem as well as the progress of the work.

The end goal is to have much better documentation around what users can expect from using the java instrumentation project. Right now, it can be ambiguous about what will be emitted from the usage of instrumentation libraries or the java agent. While many of the modules do have some sort of readme, they aren’t very thorough or consistent. So after a couple discussions, I ended up opening an issue: #13468 - Instrumentation Metadata System.

The issue was opened on March 6, 2025, about a month after I had started hacking on some ideas. At the time, the repo had 250 individual instrumentations, 242 javaagent instrumentations - 15 of which had readmes, and 59 library instrumentations - 35 of which had readmes. The readmes were mostly generic instructions for configuration options and how to instantiate the libraries.

I am now about 3 months into the project, and recently hit some exciting milestones, so it feels like a good point to pause, review what’s been done, what’s next, and spend a little time sketching out some thoughts for the final state of things to work towards.

As I work through this long project, I’m reminded of the old proverb:

How do you eat an elephant? One bite at a time!

What data is relevant?

So when thinking about an individual instrumentation, what is it? What are the properties that would be useful to have available, and how can we identify that information for the ~250 modules we have in place today?

For the first pass at brainstorming this, I considered:

Some description of what an instrumentation does and what telemetry is produced
Some form of classification of the instrumentation
- library - instrumentation for particular libraries (aws, clickhouse etc)
- internal - instrumentation used internally within the agent (opentelemetry api helpers)
- custom - instrumentation associated with supporting or generating custom instrumentation (annotations, methods)
A breakdown of the library versions that are supported, broken down by javaagent vs library support
Whether the instrumentation is enabled or disabled by default, and how to enable/disable it
The configuration options available, what they do, and their default values
Scope information - name, schemaUrl, attributes
The Semantic conventions the instrumentation follows
Detailed telemetry information
- What kind of spans are generated, and the various attributes/semantic conventions
- The metrics generated and their attributes/semantic conventions

In a future post I can talk a bit more about why each of these pieces of information would be useful, but let’s use these as a starting point. This set of information would provide a heck of a lot more information than is currently easily available without having to dig through code.

The next step was to start thinking about how to obtain all of this information.

Gatherers

There were some “easy” aspects of this project that I decided to tackle first, which would set the baseline “template” for the work that followed. There was some data that I was confident we could gather through automation in some way, and others that we would need to handle manually. So I started building out various “gatherers” to solve for various data sources or ways to obtain the information.

Baseline information - Instrumentation list and targets

I broke ground by first writing code that would parse the file structure in the instrumentation directory to identify each instrumentation, which “group” or “namespace” they belong to, and extract information about whether it has library or javaagent support. This is done by making some assumptions based on the directory paths and structure of each module.

For example, analyzing these paths:

├── instrumentation
│   ├── clickhouse-client-05
│   ├── jaxrs
│   │   ├── jaxrs-1.0
│   │   ├── jaxrs-2.0
│   ├── spring
│   │   ├── spring-cloud-gateway
│   │   │   ├── spring-cloud-gateway-2.0
│   │   │   ├── spring-cloud-gateway-2.2
│   │   │   └── spring-cloud-gateway-common

Results in the following:

Name - the full name of the instrumentation module
- clickhouse-client-05, jaxrs-1.0, spring-cloud-gateway-2.0
Namespace - the direct parent of the instrumentation module. if there is none, use the module name and strip the version
- clickhouse-client, jaxrs, spring-cloud-gateway
Group - the top most parent in the directory structure, which is used to group instrumentations together
- clickhouse-client, jaxrs, spring

Slightly more involved was parsing gradle files to extract muzzle and dependency configurations in order to try and identify which versions of the instrumented libraries are supported.

The system then outputs results to a new documentation file in yaml format at docs/instrumentation-list.yaml, and the initial implementation looked something like:

activej:
  instrumentations:
  - name: activej-http-6.0
    srcPath: instrumentation/activej-http-6.0
    target_versions:
      javaagent:
      - io.activej:activej-http:[6.0,)
akka:
  instrumentations:
  - name: akka-http-10.0
    srcPath: instrumentation/akka/akka-http-10.0
    target_versions:
      javaagent:
      - com.typesafe.akka:akka-http_2.12:[10,)
      - com.typesafe.akka:akka-http_2.13:[10,)
      - com.typesafe.akka:akka-http_2.11:[10,)

This is the PR where this was implemented, for anyone interested.

Metadata.yaml files

It became clear that there were going to be some aspects of this information that couldn’t be inferred by automation alone, at least at this point in time. Therefore, I introduced metadata.yaml files that could be created within the directory of each module, and provide a way to add and augment the information that we gather with automation.

After a few iterations of this, the file structure currently supports a handful of possible attributes:

Example metadata.yaml:

description: "Instruments the x library and provides y"
disabled_by_default: true
classification: internal
configurations:
  - name: otel.instrumentation.common.db-statement-sanitizer.enabled
    description: Enables statement sanitization for database queries.
    type: boolean
    default: true

See the docs for the latest schema.

This is the initial implementation PR, although it has evolved a bit since then.

Metric Interceptors

Identifying the metrics and the associated attributes emitted from an instrumentation can be tricky, and I iterated through several approaches including some static code analysis and manually reviewing the code of each module. In the end, I decided to try an experiment where I could leverage some of our integration tests where the instrumentation is exercised, to intercept any metrics emitted and analyze them. This ended up working pretty well. I added some configuration flags to the Agent test runners to keep track of all metrics that go through some assertion flow, and at the end of the test run, to write this data into yaml files within a .telemetry directory in each instrumentation module. Then, I scrape those files and add the information to our instrumentation model.

Example .telemetry file:

metrics:
  - name: db.client.connections.usage
    description: The number of connections that are currently in state described by the state attribute.
    type: LONG_SUM
    unit: connections
    attributes: 
      - name: pool.name
        type: STRING
      - name: state
        type: STRING
  - name: db.client.connections.max
    description: The maximum number of open connections allowed.
    type: LONG_SUM
    unit: connections
    attributes: 
      - name: pool.name
        type: STRING
  - name: db.client.connections.idle.min
    description: The minimum number of idle open connections allowed.
    type: LONG_SUM
    unit: connections
    attributes: 
      - name: pool.name
        type: STRING
  - name: db.client.connections.idle.max
    description: The maximum number of idle open connections allowed.
    type: LONG_SUM
    unit: connections
    attributes: 
      - name: pool.name
        type: STRING

This is the PR where this was initially implemented.

Output

The current state of the output of this system is getting pretty close to being useful.

Some example snippets from the current output:

alibaba:
- name: alibaba-druid-1.0
  description: |
    The Alibaba Druid instrumentation generates database connection pool metrics for druid data sources.    
  source_path: instrumentation/alibaba-druid-1.0
  scope:
    name: io.opentelemetry.alibaba-druid-1.0
  target_versions:
    javaagent:
    - com.alibaba:druid:(,)
    library:
    - com.alibaba:druid:1.0.0
  metrics:
  - name: db.client.connections.usage
    description: The number of connections that are currently in state described
      by the state attribute.
    type: LONG_SUM
    unit: connections
    attributes:
    - name: pool.name
      type: STRING
    - name: state
      type: STRING
  - name: db.client.connections.pending_requests
    description: The number of pending requests for an open connection, cumulative
      for the entire pool.
    type: LONG_SUM
    unit: requests
    attributes:
    - name: pool.name
      type: STRING
  - name: db.client.connections.max
    description: The maximum number of open connections allowed.
    type: LONG_SUM
    unit: connections
    attributes:
    - name: pool.name
      type: STRING
  - name: db.client.connections.idle.min
    description: The minimum number of idle open connections allowed.
    type: LONG_SUM
    unit: connections
    attributes:
    - name: pool.name
      type: STRING
  - name: db.client.connections.idle.max
    description: The maximum number of idle open connections allowed.
    type: LONG_SUM
    unit: connections
    attributes:
    - name: pool.name
      type: STRING
apache:
- name: apache-dubbo-2.7
  description: The Apache Dubbo instrumentation provides client and server spans
    for Apache Dubbo RPC calls. Each call produces a span named after the Dubbo
    method, enriched with standard RPC attributes (system, service, method), network
    attributes, and error details if an exception occurs.
  source_path: instrumentation/apache-dubbo-2.7
  scope:
    name: io.opentelemetry.apache-dubbo-2.7
  target_versions:
    javaagent:
      - org.apache.dubbo:dubbo:[2.7,)
  configurations:
    - name: otel.instrumentation.common.peer-service-mapping
      description: Used to specify a mapping from host names or IP addresses to peer
        services.
      type: map
      default: ''

Vision

There are all kinds of ideas of ways to leverage this information once it is more complete. At some point, the plan is to be able to generate documentation within the repo, generating individual readme files for each library with more consistency around formatting and contents, as well as replacing some of the manually maintained lists of supported libraries.

It will also be useful to be able to diff the contents of the metadata between releases, to understand what telemetry was emitted, or configurations were supported for a specific version of the opentelemetry java agent or libraries. Or to identify drift and ensure changes are intentional and documented.

At some point we would also like to automate this information being published in the opentelemetry.io registry.

Current Progress

As of today (June 6, 2025), this the state of the java agent code base, including some stats on how many modules have metadata.yaml files with some contents around descriptions/configurations:

Total Modules: 257
By classification:
- library: 230
- custom: 5
- internal: 22
metadata.yaml contents:
- descriptions: 27 (10.51%)
- configurations: 19 (7.39%)

Next steps

This post doesn’t cover everything, as the project is fairly large and I’m antsy to get back into the weeds, but up to date progress and notes are available in the github issue.

As of now, I am working on:

Reviewing more modules in order to manually create metadata.yaml files with configuration options and descriptions (help wanted!).
Experimenting with how to convey telemetry emitted by default vs emitted when certain configuration options are enabled.
Span interceptor in the test runner, with a way to identify and present the useful information collected.

To be continued!