SearchLogic

Example of usage

In controller

# Create an instance of search logic.
# It assume that params[:q] (by default) contains a json string,
# that represents a query state tree.
@search = SearchLogic.mentions params: params

# Extract 'keyword_ids' from the search logic tree.
# Since Mention ES class doesn't have 'keyword_ids' attribute,
# we should use 'xterms_values' (eXtended term values)
# and remove the values BEFORE a "real" query json for ES is built.
keyword_ids = @search.xterms_values('keyword_ids')
wordsmaster_ids = Keyword.where(id: keyword_ids).map(&:wordsmaster_id).uniq

# Specify a terms filter for 'wordsmaster_id'.
# Note, #with_terms_values return a new search logic instance.
# It allows as to use the previous instance (@search) in views
# generating links. And at the same time we use the new instance
# to "enhance" it with facets and filters and do actual requests
# to Elasticsearch.
search = @search.with_terms_values 'wordsmaster_id', wordsmaster_ids

# Add facets. Note, added facets will not affect @search instance.
query = ::SearchLogic::Facets.new(search.query).
  add_facet('sources', "terms" => {"field" => 'source'})
  query

# Do request Elasticsearch
@search_response = Connectors::Elasticsearch.client_search(
  Mention.index_name,
  query
)

In view

<%# Generate a link for a similar page, but the page will not contain "twitter" related results. %>
<%= link_to 'without twitter results',
             project_volume_index_path(@project, 
                                       search.without_terms_value("source", 'twitter').url_params) %
>

Pseudo-filters (eXtended)

In a degenerate case the search logic query tree might be a valid Elasticsearch request body. But in more general cases it contains pseudo-filters, what is not valid parts of Elasticsearch query DSL.

To turn 'search' to 'query' one must process pseudo-filters. For example talking about Mention, "keyword tags" pseudo-filter must be removed and at the same time 'wordsmaster id' terms filter should be added/modified to represent "keyword tags" values.

Each pseudo-filter represents a "virtual" filter what presents only in UI and should be somehow transformed in "real" Elasticsearch filter.

See the example section above for an additional example of pseudo-filters for 'keyword_ids'.

Some terminological details

query

'query' is a hash representation of a valid ES request body. Cannot contain pseudo-filters.

'search' is an instance of 'search logic' tree. Might contain pseudo-filters.

tree

A low level API to manipulate tree structure. Should not be directly exposed to client code.

Already implemented filters and TODO

SearchLogic has:

  • terms_filter

  • xterms_filter

  • range_filter

It assumes, we is going to add additional filter kinds as we need them.

FAQ

Q: Were there any thoughts on how to apply pseudo filters to the ES results in a generic way?

A: There is no generic way to deal with pseudo filters (at least at the moment).

Q: Do we know what the possible pseudo filters are at this point at all, or was this not explored at all?

A: A search logic tree is just a Ruby hash or a JSON string. So, you can freely recognize a pseudo filter by its "x-" prefix. For example for a xTerms filter a key in a ruby hash will be "x-terms".

Q: Does the view (url, whatever) pass explicit information on what type of ES query we are going to make, or is the query type somehow inferred from all the other params passed in?

A: The whole page state should be stored in the 'q' URL param. Thus in a near future the search logic tree will contain "page" key to deal with pagination. There is a subtle point here: how to work with Postgres or Redis related part of state. Currently SearchLogic ignores the problem.

Q: Let's say you would do a range filter in ES. would you have that in the url explicitly or would you just use 'from' and 'to' and the SearchLogic would infer that it needs to make a range filter?

A: A search logic tree should be as closed to a valid ES query tree as possible. So, if your case allows to have "hardcoded" range boundaries, you should just use it as is:search.with_range_value "created_at", 'gte', "now-1d". But if you particular case requires to re-compute the range boundaries each time, you will add "x-range" filter kind and will use it.

Last updated