OpenTelemetry Java Instrumentation Metadata - Telemetry Variations

Table of Contents

(This is a continuation of the first post about this project)

One challenge with documenting metadata associated with instrumentations is how to present the variations of the telemetry data emitted when different configuration options are enabled.

Previously we were just listing and iterating upon modules to run the generation on, and we now need the ability to run each module with different configuration options enabled, and then be able to attribute the resulting data to those configuration values that were set.

Tagging Telemetry

There is already a pattern used for running test suites with different variations, for example a common one is to run tests with stable semantic conventions enabled. This is done by registering a new test suite and adding a JVM argument to the gradle configuration:

tasks {
  val testStableSemconv by registering(Test::class) {
    jvmArgs("-Dotel.semconv-stability.opt-in=database")
  }

  check {
    dependsOn(testStableSemconv)
  }
}

If we want to run this particular gradle task, instead of targeting the standard test suite (./gradlew :instrumentation:<module>:javaagent:test) we would run ./gradlew :instrumentation:<module>:javaagent:testStableSemconv

We could augment this pattern by adding a system property with a name like metaDataConfig that acts as a way to provide a description of the configuration options that were used to generate the data.

val collectMetadata = findProperty("collectMetadata")?.toString() ?: "false"

tasks {
  val testStableSemconv by registering(Test::class) {
    jvmArgs("-Dotel.semconv-stability.opt-in=database")

    systemProperty("collectMetadata", collectMetadata)
    systemProperty("metaDataConfig", "otel.semconv-stability.opt-in=database")
  }

  test {
    systemProperty("collectMetadata", collectMetadata)
  }

  check {
    dependsOn(testStableSemconv)
  }
}

And then in our file writer we can add that info to our output, or indicate if it’s emitted by default:

String config = System.getProperty("metaDataConfig");
String when = "default";
if (config != null && !config.isEmpty()) {
  when = config;
}

writer.write("when: " + when + "\n");
writer.write("metrics:\n");
...

The resulting output for a database client that emits different metrics based on the semantic convention flag might have multiple files, one for each configuration option. For example, the default instrumentation might output:

- when: default
  metrics:
    - name: db.client.connections.idle.max
      description: The maximum number of idle open connections allowed.
      type: LONG_SUM
      unit: connections
      attributes:
        - name: pool.name
          type: STRING
    - name: db.client.connections.idle.min
      description: The minimum number of idle open connections allowed.
      type: LONG_SUM
      unit: connections
      attributes:
        - name: pool.name
          type: STRING
    - name: db.client.connections.max
      description: The maximum number of open connections allowed.
      type: LONG_SUM
      unit: connections
      attributes:
        - name: pool.name
          type: STRING
    - name: db.client.connections.pending_requests
      description: The number of pending requests for an open connection, cumulative
        for the entire pool.
      type: LONG_SUM
      unit: requests
      attributes:
        - name: pool.name
          type: STRING
    - name: db.client.connections.usage
      description: The number of connections that are currently in state described
        by the state attribute.
      type: LONG_SUM
      unit: connections
      attributes:
        - name: pool.name
          type: STRING
        - name: state
          type: STRING

And another file for the same instrumentation with the semantic convention flag enabled:

  - when: otel.semconv-stability.opt-in=database
    metrics:
      - name: db.client.connection.count
        description: The number of connections that are currently in state described
          by the state attribute.
        type: LONG_SUM
        unit: connection
        attributes:
          - name: db.client.connection.pool.name
            type: STRING
          - name: db.client.connection.state
            type: STRING
      - name: db.client.connection.idle.max
        description: The maximum number of idle open connections allowed.
        type: LONG_SUM
        unit: connection
        attributes:
          - name: db.client.connection.pool.name
            type: STRING
      - name: db.client.connection.idle.min
        description: The minimum number of idle open connections allowed.
        type: LONG_SUM
        unit: connection
        attributes:
          - name: db.client.connection.pool.name
            type: STRING
      - name: db.client.connection.max
        description: The maximum number of open connections allowed.
        type: LONG_SUM
        unit: connection
        attributes:
          - name: db.client.connection.pool.name
            type: STRING
      - name: db.client.connection.pending_requests
        description: The number of current pending requests for an open connection.
        type: LONG_SUM
        unit: request
        attributes:
          - name: db.client.connection.pool.name
            type: STRING

This approach was implemented in this PR.

Span Data

In terms of characterizing telemetry data emitted by an instrumentation, metrics are more straight forward than spans. For one thing, most of the instrumentation in the agent is focused on spans instead of metrics, so there is a lot more data to collect and analyze. Additionally, span names are typically at least partially dynamic, and depending on the span type, there might not be an easy way to enumerate and describe all the different variations that are emitted by an instrumentation.

Another thing to consider when writing our gatherer, is that not all spans emitted by our tests are the result of the particular instrumentation we are testing. For example, if we are testing a database client instrumentation, we might have spans that are emitted by the database server, or by an underlying http client library. We need to be able to filter out those spans that are not relevant to the instrumentation we are testing, and only include those that are emitted by the instrumentation under test.

Due to these considerations, my initial approach is to focus on summarizing the span data by focusing on the attributes emitted in the context of each Span Kind. Luckily for us, each span also has an instrumentation scope attached to it, which we can use to filter out spans that are not relevant to the instrumentation we are testing.

Our resulting raw output for intercepting spans from a test might look like this:

when: default
spans:
  - scope: test
    spans:
      - span_kind: INTERNAL
        attributes:
  - scope: io.opentelemetry.clickhouse-client-0.5
    spans:
      - span_kind: CLIENT
        attributes:
          - name: db.operation
            type: STRING
          - name: db.name
            type: STRING
          - name: server.address
            type: STRING
          - name: server.port
            type: LONG
          - name: db.statement
            type: STRING
          - name: db.system
            type: STRING

Span Parser

We can generate these span files the same way we did for the metrics, so all of that plumbing is already in place. Next we need to focus on parsing this data and then incorporating it into our instrumentation list output. Since the test runners don’t have a great way to know which scopes we are interested in, we will delegate the responsibility of filtering out that data to our parser.

Similar to what we needed to do with metrics, we also need a way to deduplicate and clean up all the span data, as there will be a lot of duplicate spans and attributes emitted by different tests.

After implementing all of this, we end up with something like:

telemetry:
- when: default
  spans:
  - span_kind: CLIENT
    attributes:
    - name: db.operation
      type: STRING
    - name: server.address
      type: STRING
    - name: server.port
      type: LONG
    - name: db.name
      type: STRING
    - name: db.system
      type: STRING
    - name: db.statement
      type: STRING
- when: otel.semconv-stability.opt-in=database
  metrics:
  - name: db.client.operation.duration
    description: Duration of database client operations.
    type: HISTOGRAM
    unit: s
    attributes:
    - name: db.namespace
      type: STRING
    - name: db.operation.name
      type: STRING
    - name: db.system.name
      type: STRING
    - name: server.address
      type: STRING
    - name: server.port
      type: LONG
  spans:
  - span_kind: CLIENT
    attributes:
    - name: db.system.name
      type: STRING
    - name: db.namespace
      type: STRING
    - name: error.type
      type: STRING
    - name: db.operation.name
      type: STRING
    - name: server.port
      type: LONG
    - name: db.query.text
      type: STRING
    - name: db.response.status_code
      type: STRING
    - name: server.address
      type: STRING

This approach was implemented in this PR.

Longer term, we want to be able to run this across all modules as part of a nightly run, and not maintain a list of explicit gradle tasks to execute, but this approach gives us a way to slowly roll this out for easier review and allows for smaller iterations.

One thing I’m now thinking about is whether we should also be specifying telemetry based on whether the instrumentation is provided by the javaagent vs library instrumentations. To be continued…