Skip to content

feat: Add client request duration metric.#4132

Open
blakeli0 wants to merge 8 commits intomainfrom
add-basic-metrics
Open

feat: Add client request duration metric.#4132
blakeli0 wants to merge 8 commits intomainfrom
add-basic-metrics

Conversation

@blakeli0
Copy link
Contributor

@blakeli0 blakeli0 commented Mar 7, 2026

This PR adds basic implementation for client request duration metric.

The following attributes are also added:

  • gcp.client.artifact. Added as instrumentation scope name of the meter.
  • rpc.response.status_code. Added as a String representation of StatusCode.Code.

The overall design follows ApiTracer/ApiTracerFactory pattern, creates GoldenSignalMetricsTracer/GoldenSignalMetricsTracerFactory that implement ApiTracer/ApiTracerFactory respectively.
GoldenSignalMetricsTracerFactory is expected to be passed in during client initialization. GoldenSignalMetricsTracer tracks the lifecycle of a request and calculates the duration of client requests.

@product-auto-label product-auto-label bot added the size: l Pull request size is large. label Mar 7, 2026
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a new client request duration metric, gcp.client.request.duration, to provide better observability into the total time taken for logical client requests. It leverages OpenTelemetry for metric instrumentation and includes attributes for the client artifact and RPC response status. The changes also involve refactoring common observability utilities into a dedicated class for improved code organization.

Highlights

  • New Client Request Duration Metric: Introduced GoldenTelemetryMetricsTracer and GoldenTelemetryMetricsTracerFactory to implement a new gcp.client.request.duration metric, measuring the total time for logical client requests.
  • Enhanced Metric Attributes: The new metric includes gcp.client.artifact as the instrumentation scope name and rpc.response.status_code to capture the outcome of client requests (e.g., OK, CANCELLED, INTERNAL).
  • Observability Utilities Refactoring: Common utility methods for extracting error status and converting attributes to OpenTelemetry format were extracted into a new ObservabilityUtils class, improving code organization and reusability.
  • Test Dependency Addition: Added opentelemetry-sdk-testing as a test-scoped dependency to support comprehensive testing of OpenTelemetry-related components.
Changelog
  • gax-java/gax/pom.xml
    • Added opentelemetry-sdk-testing dependency.
  • gax-java/gax/src/main/java/com/google/api/gax/tracing/GoldenTelemetryMetricsTracer.java
    • Added a new class to implement the client request duration metric.
  • gax-java/gax/src/main/java/com/google/api/gax/tracing/GoldenTelemetryMetricsTracerFactory.java
    • Added a new factory class for creating GoldenTelemetryMetricsTracer instances.
  • gax-java/gax/src/main/java/com/google/api/gax/tracing/MetricsTracer.java
    • Removed internal utility methods and updated calls to use the new ObservabilityUtils.
  • gax-java/gax/src/main/java/com/google/api/gax/tracing/ObservabilityAttributes.java
    • Added a new constant for the rpc.response.status_code attribute.
  • gax-java/gax/src/main/java/com/google/api/gax/tracing/ObservabilityUtils.java
    • Added a new utility class containing shared methods for extracting status and converting attributes to OpenTelemetry format.
  • gax-java/gax/src/main/java/com/google/api/gax/tracing/OpenTelemetryMetricsRecorder.java
    • Removed internal utility methods and updated calls to use the new ObservabilityUtils.
  • gax-java/gax/src/test/java/com/google/api/gax/tracing/GoldenTelemetryMetricsTracerFactoryTest.java
    • Added unit tests for the GoldenTelemetryMetricsTracerFactory.
  • gax-java/gax/src/test/java/com/google/api/gax/tracing/GoldenTelemetryMetricsTracerTest.java
    • Added unit tests for the GoldenTelemetryMetricsTracer.
  • gax-java/gax/src/test/java/com/google/api/gax/tracing/MetricsTracerTest.java
    • Removed tests for the extractStatus method, which was moved.
  • gax-java/gax/src/test/java/com/google/api/gax/tracing/ObservabilityUtilsTest.java
    • Added unit tests for the new ObservabilityUtils class.
Activity
  • No human activity has occurred on this pull request yet.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new metric for client request duration, gcp.client.request.duration, implemented through a new GoldenTelemetryMetricsTracer. The changes also include a valuable refactoring that centralizes utility methods like extractStatus and toOtelAttributes into a new ObservabilityUtils class, improving code organization. The new functionality is well-tested. My review includes a couple of suggestions to improve robustness and error handling in the new tracer and its factory, which are still valid. Overall, this is a solid contribution.

Comment on lines +53 to +55
public ApiTracer newTracer(ApiTracer parent, SpanName spanName, OperationType operationType) {
return new GoldenTelemetryMetricsTracer(openTelemetry, apiTracerContext);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The apiTracerContext field is not initialized in the constructor and can be null if newTracer is called before withContext. This would lead to a NullPointerException when apiTracerContext.libraryMetadata() is called inside the GoldenTelemetryMetricsTracer constructor. To make the factory more robust against misuse, please add a null-check for apiTracerContext and throw an IllegalStateException if it hasn't been set.

  public ApiTracer newTracer(ApiTracer parent, SpanName spanName, OperationType operationType) {
    if (apiTracerContext == null) {
      throw new IllegalStateException(
          "ApiTracerContext has not been set. Call withContext() before creating a new tracer.");
    }
    return new GoldenTelemetryMetricsTracer(openTelemetry, apiTracerContext);
  }
References
  1. Explicit if checks for null handling are preferred for better readability in simple cases, aligning with the preference over java.util.Optional for such scenarios.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should not happen with the current setup but +1 for the extra safety "against misuse" a null check brings.

@Override
public void operationSucceeded() {
if (clientRequestFinished.getAndSet(true)) {
throw new IllegalStateException();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For better diagnostics, the IllegalStateException should include a descriptive message, for example: "Operation has already been completed". This feedback also applies to the exceptions thrown in operationCancelled() (line 92) and operationFailed() (line 102). Consider defining this message as a shared constant for reuse.

Suggested change
throw new IllegalStateException();
throw new IllegalStateException("Operation has already been completed");

Copy link
Contributor

@diegomarquezp diegomarquezp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR.

external/com_google_api_gax_java/gax/src/test/java/com/google/api/gax/tracing/GoldenSignalMetricsTracerTest.java:59: error: cannot find symbol
  private InMemoryMetricReader metricReader;
          ^
  symbol:   class InMemoryMetricReader
  location: class GoldenSignalMetricsTracerTest
external/com_google_api_gax_java/gax/src/test/java/com/google/api/gax/tracing/GoldenSignalMetricsTracerTest.java:133: error: cannot find symbol
  private void verifyMetricDataContainsMeterInfo(MetricData metricData) {

I think we need to add an entry to gax-java/dependencies.properties and related BUILD files.


@Override
public void operationSucceeded() {
if (clientRequestFinished.getAndSet(true)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember at some point we mentioned that we don't expect race conditions when operating at attempt level. Specifically, we may process separate attempts in separate threads, but attempts will always be sequential.

Could you explain why we need an atomic boolean in this case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. This is mostly copied from MetricsTracer. I confirmed with @lqiu96 as well that we don't need this.

import java.util.concurrent.TimeUnit;
import java.util.concurrent.atomic.AtomicBoolean;

public class GoldenSignalMetricsTracer implements ApiTracer {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the GoldenSignal prefix may require a justification to represent a direct implementation of the OpenTelemetry API, since the connection between these two may be a bit obscure at least to my understanding.

Maybe we can add a Javadoc in the class to explain its purpose?

Copy link
Contributor

@diegomarquezp diegomarquezp Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I'd like to reintroduce the discussion of naming in this effort. The Google Cloud Documentaiton for Observability defines this effort as Application Centric Observability. AppCentric is then a valid public-facing convention.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can add a Javadoc in the class to explain its purpose?

Added. Also I extracted the metric recording logic to a new class GoldenSignalMetricsRecorder to

  1. Wrap the OpenTelemetry recording logic.
  2. Make sure the meter and histogram are only created once per client.

"Operation has already been completed";

private final Stopwatch clientRequestTimer = Stopwatch.createStarted();
private final AtomicBoolean clientRequestFinished;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A nit on public vs internal semantics: public documentation refers to operations as "client requests", but our implementation historically defines these as operations. Maybe operationFinished is more consistent with the tracing structure for such a private variable?

Comment on lines +53 to +55
public ApiTracer newTracer(ApiTracer parent, SpanName spanName, OperationType operationType) {
return new GoldenTelemetryMetricsTracer(openTelemetry, apiTracerContext);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should not happen with the current setup but +1 for the extra safety "against misuse" a null check brings.

}

@Test
void operationSucceeded_recordsDuration() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we separate operationSucceeded_recordsDuration from something like operationSucceeded_recordsOKStatus?

Copy link
Contributor Author

@blakeli0 blakeli0 Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great idea! Updated.

I also extracted the unit testing of meter to the recorder's unit tests.

clientRequestTimer.elapsed(TimeUnit.SECONDS), toOtelAttributes(attributes));
}

@Override
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: I believe showcase cannot emulate all error behavior out of the box. I'm planning to add unit tests with a local server in order to emulate possible error codes and other behavior.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, that's a great idea! On the other hand, I would still prioritize adding more attributes. Because we are not changing any existing functionality, we are relying on them to add traces.

}
attributes.put(RPC_RESPONSE_STATUS_ATTRIBUTE, StatusCode.Code.OK.toString());
clientRequestDurationRecorder.record(
clientRequestTimer.elapsed(TimeUnit.SECONDS), toOtelAttributes(attributes));
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

elapsed returns a long, we need to do something to make this a double fraction like:

elapsed(NANOSECONDS)/1_000_000_000.0

Otherwise we won't record things below 1s

meter
.histogramBuilder(CLIENT_REQUEST_DURATION_METRIC_NAME)
.setDescription(CLIENT_REQUEST_DURATION_METRIC_DESCRIPTION)
.setUnit("s")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs .setExplicitBucketBoundariesAdvice()

Boundaries:

[0, 0.0001, 0.0005, 0.0010, 0.005, 0.010, 0.050, 0.100, 0.5, 1.0, 5.0, 10.0, 60.0, 300.0, 900.0, 3600.0]


@Override
public ApiTracer newTracer(ApiTracer parent, SpanName spanName, OperationType operationType) {
return new GoldenSignalMetricsTracer(openTelemetry, apiTracerContext);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where do we capture things like rpc.method?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be populated in apiTracerContext. Then the tracer will get it from apiTracerContext and record it.

@sonarqubecloud
Copy link

sonarqubecloud bot commented Mar 9, 2026

@sonarqubecloud
Copy link

sonarqubecloud bot commented Mar 9, 2026

Quality Gate Failed Quality Gate failed for 'java_showcase_integration_tests'

Failed conditions
30.9% Coverage on New Code (required ≥ 80%)

See analysis details on SonarQube Cloud

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size: l Pull request size is large.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants