Conversation
Due to classic and native buckets we now have a special case for the timestamps rule. Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
| * Two Histogram MetricPoints where one has Classic Buckets and the other has Native Buckets. | ||
| * Two GaugeHistogram MetricPoints where one has Classic Buckets and the other has Native Buckets. | ||
|
|
||
| Note: these exemptions allow exposing both Classic and Native Buckets for the |
There was a problem hiding this comment.
Sorry if this is a can of worms...
Did we already discuss having classic + native on the same line?
If we keep them on separate lines, I think we need more updates to the spec. E.g.
MetricFamily
A MetricFamily MAY have zero or more Metrics. A MetricFamily MUST have a name, HELP, TYPE, and UNIT metadata. Every Metric within a MetricFamily MUST have a unique LabelSet.
We need to carve out an exception from "MUST have a unique LabelSet" for this.
There was a problem hiding this comment.
Sorry if this is a can of worms...
Did we already discuss having classic + native on the same line?
No, we have not. I think the fallout from that would be worse than having two lines.
- Readability would be worse.
- Native and classic bucketing has different exemplars so where do you put those?
- One composite value per line no longer true.
- probably more...
If we keep them on separate lines, I think we need more updates to the spec. E.g.
MetricFamily
A MetricFamily MAY have zero or more Metrics. A MetricFamily MUST have a name, HELP, TYPE, and UNIT metadata. Every Metric within a MetricFamily MUST have a unique LabelSet.We need to carve out an exception from "MUST have a unique LabelSet" for this.
True.
There was a problem hiding this comment.
Readability would be worse.
Agreed. This is also my biggest concern.
Native and classic bucketing has different exemplars so where do you put those?
They aren't required to have different exemplars (both just need to be evenly distributed now), so I think they could use either set of exemplars, and be compliant.
One composite value per line no longer true.
I guess it depends on how you define composite value, but you could consider it a single composite value. The fact that we currently store it as two separate things (or keep one or the other) seems like an implementation detail of the current storage mechanism.
If we put both in a single line, it would actually match the protobuf format more closely. https://github.com/prometheus/client_golang/blob/4b86739a4e19a9d33b78ab94fc9d0c54bfc6ebf6/prometheus/histogram.go#L810. The protobuf format sends both classic and native buckets in a single composite value.
Example text format (just to make the proposal clear):
# TYPE acme_http_request_seconds histogram
# UNIT acme_http_request_seconds seconds
# HELP acme_http_request_seconds Latency histogram of all of ACME's HTTP requests.
-acme_http_request_seconds{path="/api/v1",method="GET"} {count:2,sum:1.2e2,schema:0,zero_threshold:1e-4,zero_count:0,positive_spans:[1:2],positive_buckets:[1,1]}
-acme_http_request_seconds{path="/api/v1",method="GET"} {count:2,sum:1.2e2,bucket:[0.5:1,1:2,+Inf:2]}
+acme_http_request_seconds{path="/api/v1",method="GET"} {count:2,sum:1.2e2,bucket:[0.5:1,1:2,+Inf:2],schema:0,zero_threshold:1e-4,zero_count:0,positive_spans:[1:2],positive_buckets:[1,1]}I'll try to lay out benefits for the sake of discussion. I'm not sold that it is the right direction, but would like us to have the discussion.
- It probably makes parsing more efficient and simpler.
- E.g. If we want to drop classic buckets only if the native buckets exist, we need to keep track of the series we've gotten native buckets for. We also can't easily implement the inverse: "Keep native buckets only if the classic buckets don't exist" because we require the native buckets to come first.
- It keeps label uniqueness within a metric.
- It might be confusing for the average user to see two lines with the same name and labels if they aren't familiar with the protocol.
- The number of lines in the text format is always equal to the number of unique label sets (cardinality).
- Makes the spec more consistent, and requires fewer "carve-outs". Maybe less of a practical issue, but "MUST do X, except when A, B, C" ... makes it harder to know if something is allowed or not and tends to make the spec more complex to implement.
- Eliminates "invalid" states: Sum, count, start timestamps need to be the same (I assume?) between classic and native buckets. Having them on the same line enforces this.
There was a problem hiding this comment.
EDIT: Lol, @dashpole comment didn't load for me so I duplicate a bit.
Oh yes, it would be nice to discuss this case, which I assume exists only for migration purposes?
I assume SDK and Prometheus protobuf allows mixed bucketing on the same histogram for migration purposes. This assumed on both text and ingestion there will be classic and native histograms with different names. This assumption breaks with both composite values AND full NHCB mode.
I think the fallout from that would be worse than having two lines.
For the text, I wonder if two lines is actually not more complicated than combined line, so the opposite being true 🤔
One composite value per line no longer true.
I wonder if you took it wrongly. I think what @dashpole proposes is NOT
mixed-histogram {<native sample>} {<classic sample>}
I think what I would do is combined sample that has both schema etc and buckets fields.
mixed-histogram {count:59,sum:1.2e2, bucket:[0.0:0,1e-05:0,0.0001:5,0.1:8,1.0:10,10.0:11,100000.0:11,1e+06:15,1e+23:16,1.1e+23:17,+Inf:17],schema:7,zero_threshold:1e-4,zero_count:0,negative_spans:[1:2],negative_buckets:[5,7],positive_spans:[-1:2,3:4],positive_buckets:[5,7,10,9,8,8]} st@1520430000.123
Benefits of combining sample (single line):
- No need for non trivial exceptions on unique MetricFamily Metric labels per line. This use case feels like a temporary migration case, so impacting the whole spec with non-trivial exceptions might be not worth it.
- No need to specify ordering of duplicate labels lines.
- Strong, format enforced, requirements for sample to be a single histogram (!!!), similar to Prometheus Protobuf and our SDKs logic. Separate line would allow different sum/count etc unless we add many, easy to miss, MUST words.
- Consistency with PrometheusProto?
nits:
- Less data
- Parsing might be actually simpler
Downsides:
- Native and classic bucketing has different exemplars so where do you put those?
- That's the biggest challenge. Perhaps it's fine to mix those for this temporary, migration case?
- Readability could be worse.
- Arguable as long JSON like lines are just a bit longer and they might be already less readable
- Also it's relatively rare case
There was a problem hiding this comment.
Hmm, it is tempting.
For the exemplars, there's an easy solution, which is to only expose the exemplars for the classic buckets. Since we think this is a migration case and exemplars aren't 100% guaranteed anyway (they are inherently sampled).
Another upside would be that we can get rid of the rule to have native histograms first.
There was a problem hiding this comment.
FYI: Mention of the mix case in SDK: https://prometheus.io/docs/specs/native_histograms/#scraping-both-classic-and-native-histograms
There was a problem hiding this comment.
cons:
- this is a format that's only really needed for a temporary situation when someone is migrating from one system to another.
- It requires SDK maintainers to implement yet another format
pros:
- This fits the philosophy we've been moving towards which is removing the need for multiline state -- one line says everything you need to know
- This removes ambiguity
I think preserving the one-line/stateless approach is worth the extra effort, especially if the format is not too onerous / verbose.
There was a problem hiding this comment.
CONSENSUS: Try single line for the mixed case
Due to classic and native buckets we now have a special case for the timestamps rule.