MB-65018 add custom_filter/custom_score query nodes with context-driven callback hooks#2289
MB-65018 add custom_filter/custom_score query nodes with context-driven callback hooks#2289maneuvertomars wants to merge 3 commits intomasterfrom
Conversation
…en callback hooks
search/query/custom_filter.go
Outdated
| childSearcher, err := q.QueryVal.Searcher(ctx, i, m, options) | ||
| if err != nil { | ||
| return nil, err | ||
| } | ||
|
|
||
| // Resolve the request-scoped callback builder injected by the embedder. | ||
| factory, _ := ctx.Value(CustomFilterContextKey).(CustomFilterFactory) | ||
| if factory == nil { | ||
| return nil, fmt.Errorf("no custom filter factory registered in context") | ||
| } | ||
|
|
||
| // Build the per-hit filter callback from query-provided source/params/fields. | ||
| filterFunc, err := factory(q.Source, q.Params, q.Fields) | ||
| if err != nil { |
There was a problem hiding this comment.
I would argue that passing a callback function in the context object is very flaky, and difficult for library users to add directly to their application.
Also if no FilterHook is defined we should not error out i feel and just do unfiltered search.
Why not try this?
func (q *CustomFilterQuery) Searcher(ctx context.Context, i index.IndexReader, m mapping.IndexMapping, options search.SearcherOptions) (search.Searcher, error) {
// Build the inner searcher first; custom filtering wraps its output.
childSearcher, err := q.QueryVal.Searcher(ctx, i, m, options)
if err != nil {
return nil, err
}
// Check if the factor hook is registered
if FilterHook == nil {
// No filtering hook registered => no filtering to be done
// Just return the child searcher itself
return childSearcher, nil
}
filterFunc, err := filterFactory(q.Source, q.Params, q.Fields)
if err != nil {
return nil, err
}
}There was a problem hiding this comment.
I think we should fail fast here rather than silently bypass custom_filter/custom_score. If a query explicitly contains a custom node and no hook is registered, returning the child searcher would change query semantics and produce incorrect results. The context-carried hook is intentional so the execution runtime stays request-scoped and embedder-defined, while Bleve remains generic.
search/query/custom_filter.go
Outdated
|
|
||
| type CustomFilterQuery struct { | ||
| // QueryVal is the child query whose candidate matches are filtered. | ||
| QueryVal Query `json:"query"` |
There was a problem hiding this comment.
Query as the param name perhaps?
| } | ||
| if q.Source == "" { | ||
| return fmt.Errorf("custom filter query must have source") | ||
| } |
There was a problem hiding this comment.
Is a nil Fields param value valid?
There was a problem hiding this comment.
Yes, nil/empty Fields is valid by design. It means no stored fields are loaded into doc.fields; I clarified that in the field comments.
search/query/custom_filter.go
Outdated
| // Params carries caller-provided values passed as the second UDF argument. | ||
| Params map[string]interface{} `json:"params,omitempty"` | ||
| // Source carries embedding-defined callback source that travels with the query. | ||
| Source string `json:"source,omitempty"` |
There was a problem hiding this comment.
Since a nil or empty source is invalid we should not have omitempty
search/query/custom_filter.go
Outdated
| } | ||
|
|
||
| func (q *CustomFilterQuery) MarshalJSON() ([]byte, error) { | ||
| inner := map[string]interface{}{ |
There was a problem hiding this comment.
to allow for easier extensibility of the struct, we can have a local copy of the struct here right?
type inner struct {
Query Query `json:"query"`
Fields []string `json:"fields,omitempty"`
Params map[string]interface{} `json:"params,omitempty"`
Source string `json:"source"`
}
return util.MarshalJSON(struct {
CustomFilter inner `json:"custom_filter"`
}{
CustomFilter: inner{
Query: q.QueryVal,
Fields: q.Fields,
Params: q.Params,
Source: q.Source,
},
})There was a problem hiding this comment.
Makes sense. I switched MarshalJSON() to use a typed local wrapper so the JSON shape stays explicit and easier to extend.
|
|
||
| // CustomScoreFactory lets the embedding application provide request-scoped | ||
| // score callbacks created from query-provided source/params/fields. | ||
| type CustomScoreFactory func(source string, params map[string]interface{}, fields []string) (searcher.ScoreFunc, error) |
There was a problem hiding this comment.
Instead of having these types and having library users pass in a function in context, consider this.
var (
filterFactory CustomFilterFactory
scoreFactory CustomScoreFactory
)
func SetFilterFactory(f CustomFilterFactory) {
filterFactory = f
}
func SetScoreFactory(f CustomScoreFactory) {
scoreFactory = f
}
// usage from a bleve API POV
```go
bleve.SetFilterFactory(f)
bleve.SetFilterFactory(f)
// can avoid context overhead entirelyThere was a problem hiding this comment.
That would make the hook registration package-global in Bleve, whereas the current design keeps the execution hook request-scoped and explicit at search time. I’d prefer avoiding Bleve-level global mutable state here; the context handoff is small, aligns with existing search-time context usage, and keeps the embedder control per request.
| } | ||
|
|
||
| // Resolve the request-scoped callback builder injected by the embedder. | ||
| factory, _ := ctx.Value(CustomScoreContextKey).(CustomScoreFactory) |
There was a problem hiding this comment.
ditto, same as before - can avoid context ops by a simple global var init.
There was a problem hiding this comment.
Same reasoning here. I’d prefer keeping this request-scoped rather than introducing Bleve-level global mutable hooks. The context lookup cost is negligible relative to search/execution, and the current design keeps the embedder control explicit per request.
This PR adds two new query node types in Bleve: custom_filter and custom_score, both wrapping an inner query.
Execution is embedding-driven via request-scoped context hooks (CustomFilterFactory / CustomScoreFactory) so Bleve remains engine-agnostic.
custom_filter applies per-hit keep/drop logic through FilteringSearcher; custom_score applies per-hit score mutation through ScoreMutatingSearcher.