Skip to content

fix(om2.0): MUST timestamp exemptions#2860

Open
krajorama wants to merge 1 commit intomainfrom
krajo/om2.0-consistency3
Open

fix(om2.0): MUST timestamp exemptions#2860
krajorama wants to merge 1 commit intomainfrom
krajo/om2.0-consistency3

Conversation

@krajorama
Copy link
Member

Due to classic and native buckets we now have a special case for the timestamps rule.

Due to classic and native buckets we now have a special case for
the timestamps rule.

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
* Two Histogram MetricPoints where one has Classic Buckets and the other has Native Buckets.
* Two GaugeHistogram MetricPoints where one has Classic Buckets and the other has Native Buckets.

Note: these exemptions allow exposing both Classic and Native Buckets for the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry if this is a can of worms...

Did we already discuss having classic + native on the same line?

If we keep them on separate lines, I think we need more updates to the spec. E.g.

MetricFamily
A MetricFamily MAY have zero or more Metrics. A MetricFamily MUST have a name, HELP, TYPE, and UNIT metadata. Every Metric within a MetricFamily MUST have a unique LabelSet.

We need to carve out an exception from "MUST have a unique LabelSet" for this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry if this is a can of worms...

Did we already discuss having classic + native on the same line?

No, we have not. I think the fallout from that would be worse than having two lines.

  • Readability would be worse.
  • Native and classic bucketing has different exemplars so where do you put those?
  • One composite value per line no longer true.
  • probably more...

If we keep them on separate lines, I think we need more updates to the spec. E.g.

MetricFamily
A MetricFamily MAY have zero or more Metrics. A MetricFamily MUST have a name, HELP, TYPE, and UNIT metadata. Every Metric within a MetricFamily MUST have a unique LabelSet.

We need to carve out an exception from "MUST have a unique LabelSet" for this.

True.

Copy link
Contributor

@dashpole dashpole Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Readability would be worse.

Agreed. This is also my biggest concern.

Native and classic bucketing has different exemplars so where do you put those?

They aren't required to have different exemplars (both just need to be evenly distributed now), so I think they could use either set of exemplars, and be compliant.

One composite value per line no longer true.

I guess it depends on how you define composite value, but you could consider it a single composite value. The fact that we currently store it as two separate things (or keep one or the other) seems like an implementation detail of the current storage mechanism.

If we put both in a single line, it would actually match the protobuf format more closely. https://github.com/prometheus/client_golang/blob/4b86739a4e19a9d33b78ab94fc9d0c54bfc6ebf6/prometheus/histogram.go#L810. The protobuf format sends both classic and native buckets in a single composite value.

Example text format (just to make the proposal clear):

# TYPE acme_http_request_seconds histogram
# UNIT acme_http_request_seconds seconds
# HELP acme_http_request_seconds Latency histogram of all of ACME's HTTP requests.
-acme_http_request_seconds{path="/api/v1",method="GET"} {count:2,sum:1.2e2,schema:0,zero_threshold:1e-4,zero_count:0,positive_spans:[1:2],positive_buckets:[1,1]}
-acme_http_request_seconds{path="/api/v1",method="GET"} {count:2,sum:1.2e2,bucket:[0.5:1,1:2,+Inf:2]}
+acme_http_request_seconds{path="/api/v1",method="GET"} {count:2,sum:1.2e2,bucket:[0.5:1,1:2,+Inf:2],schema:0,zero_threshold:1e-4,zero_count:0,positive_spans:[1:2],positive_buckets:[1,1]}

I'll try to lay out benefits for the sake of discussion. I'm not sold that it is the right direction, but would like us to have the discussion.

  • It probably makes parsing more efficient and simpler.
    • E.g. If we want to drop classic buckets only if the native buckets exist, we need to keep track of the series we've gotten native buckets for. We also can't easily implement the inverse: "Keep native buckets only if the classic buckets don't exist" because we require the native buckets to come first.
  • It keeps label uniqueness within a metric.
    • It might be confusing for the average user to see two lines with the same name and labels if they aren't familiar with the protocol.
    • The number of lines in the text format is always equal to the number of unique label sets (cardinality).
  • Makes the spec more consistent, and requires fewer "carve-outs". Maybe less of a practical issue, but "MUST do X, except when A, B, C" ... makes it harder to know if something is allowed or not and tends to make the spec more complex to implement.
  • Eliminates "invalid" states: Sum, count, start timestamps need to be the same (I assume?) between classic and native buckets. Having them on the same line enforces this.

Copy link
Member

@bwplotka bwplotka Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EDIT: Lol, @dashpole comment didn't load for me so I duplicate a bit.

Oh yes, it would be nice to discuss this case, which I assume exists only for migration purposes?

I assume SDK and Prometheus protobuf allows mixed bucketing on the same histogram for migration purposes. This assumed on both text and ingestion there will be classic and native histograms with different names. This assumption breaks with both composite values AND full NHCB mode.

I think the fallout from that would be worse than having two lines.

For the text, I wonder if two lines is actually not more complicated than combined line, so the opposite being true 🤔

One composite value per line no longer true.

I wonder if you took it wrongly. I think what @dashpole proposes is NOT

mixed-histogram {<native sample>} {<classic sample>}

I think what I would do is combined sample that has both schema etc and buckets fields.

mixed-histogram {count:59,sum:1.2e2, bucket:[0.0:0,1e-05:0,0.0001:5,0.1:8,1.0:10,10.0:11,100000.0:11,1e+06:15,1e+23:16,1.1e+23:17,+Inf:17],schema:7,zero_threshold:1e-4,zero_count:0,negative_spans:[1:2],negative_buckets:[5,7],positive_spans:[-1:2,3:4],positive_buckets:[5,7,10,9,8,8]} st@1520430000.123

Benefits of combining sample (single line):

  • No need for non trivial exceptions on unique MetricFamily Metric labels per line. This use case feels like a temporary migration case, so impacting the whole spec with non-trivial exceptions might be not worth it.
  • No need to specify ordering of duplicate labels lines.
  • Strong, format enforced, requirements for sample to be a single histogram (!!!), similar to Prometheus Protobuf and our SDKs logic. Separate line would allow different sum/count etc unless we add many, easy to miss, MUST words.
  • Consistency with PrometheusProto?

nits:

  • Less data
  • Parsing might be actually simpler

Downsides:

  • Native and classic bucketing has different exemplars so where do you put those?
    • That's the biggest challenge. Perhaps it's fine to mix those for this temporary, migration case?
  • Readability could be worse.
  • Arguable as long JSON like lines are just a bit longer and they might be already less readable
  • Also it's relatively rare case

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, it is tempting.

For the exemplars, there's an easy solution, which is to only expose the exemplars for the classic buckets. Since we think this is a migration case and exemplars aren't 100% guaranteed anyway (they are inherently sampled).

Another upside would be that we can get rid of the rule to have native histograms first.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cons:

  • this is a format that's only really needed for a temporary situation when someone is migrating from one system to another.
  • It requires SDK maintainers to implement yet another format

pros:

  • This fits the philosophy we've been moving towards which is removing the need for multiline state -- one line says everything you need to know
  • This removes ambiguity

I think preserving the one-line/stateless approach is worth the extra effort, especially if the format is not too onerous / verbose.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CONSENSUS: Try single line for the mixed case

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants