autocomplete with elasticsearch and tire

Autocomplete With Elasticsearch and Tire

JUN 16TH, 2013 | COMMENTS

We’ve recently seen a need to introduce an autocomplete feature to TipterTipter allows its users to search for Trips (a.k.a Travel Blogs) and Tips (the building blocks of Trips). I’ve spent couple of days on this feature, integrating multiple resources until I got the desired result.

This list of resources includes: this great blog post, which refers to an excellent Stack Overflow answer, and this issue on github which explains how to use multifield in Tire. It also includes this blog post which I only found at later stages, definitely could have saved me some time. And of course Elasticsearch guide, Tire documentation, and Railscast.

Let’s start by looking at some code. I’ve listed the relevant code snippets of our trip model. Each trip has_many countries which has_one sovereign model that hold the country’s name. Our target (for this post at least…) is to be able to perform autocomplete on the country name, and offer the user relevant travel blogs that include that country.

class Trip < ActiveRecord::Base
  has_many :countries, dependent: :destroy

  def countries_names
    sovereign_names = self.countries.map { |c| c.sovereign.nil? ? "" : c.sovereign.name }.join(", ")
    sovereign_names
  end


  include Tire::Model::Search
  include Tire::Model::Callbacks

  settings :analysis => { 
             :filter => {
               :trip_ngram  => {
                 "type"     => "edgeNGram",
                 "max_gram" => 15,
                 "min_gram" => 2 }
             },
             :analyzer => {
               :index_ngram_analyzer => {
                 "type" => "custom",
                 "tokenizer" => "standard",
                 "filter" => [ "standard", "lowercase", "trip_ngram" ] 
               }, 
               :search_ngram_analyzer => {
                 "type" => "custom",
                 "tokenizer" => "standard",
                 "filter" => [ "standard", "lowercase"] 
               }, 
             }     
          } 


  mapping do 
   indexes :countries_names, :type => 'multi_field', :fields => {
      :countries_names => { :type => "string"},
      :"countries_names.autocomplete" => { :search_analyzer => "search_ngram_analyzer", :index_analyzer => "index_ngram_analyzer", :type => "string"}
   }
  end

  def to_indexed_json 
    to_json(methods: [:countries_names])
  end

  def self.regular_search(params) 
    tire.search(load: true) do
       query {string 'countries_names:' + params }
    end
  end


  def self.autocomplete(params) 
    tire.search(load: true) do
       query {string 'countries_names.autocomplete:' + params }
    end
  end

So, for a start we’ll index the countries_names using the mapping block code. We will use the Elasticsearch multi_field type to define “regular” countries_names search, and autocomplete one. This will allow us to declare two different methods able to be used based on our needs. For the countries_names:autocomplete variant, we declare two types of analyzers :search_analyzerand :index_analyzer.

A great overview of analyzers’ structure and how they work in Elasticsearch can be found here, and I don’t want to repeat that (and probably introduce errors… 🙂 ). However, some of the modifications I did were to use Edge NGram token filter so that n-grams will only be generated from the beginning of the word.

In addition I used different filters for search and indexing where the search filter doesn’t use the Edge NGram filter.

My initial version of the mapping used a single analyzer:

 mapping do 
   indexes :countries_names, :type => 'multi_field', :fields => {
      :countries_names => { :type => "string"},
      :"countries_names.autocomplete" => { :analyzer => "index_ngram_analyzer", :type => "string"}
   }
 end

While playing with several search queries on the console, I ran this query:

1.9.3-p327 :159 > Trip.autocomplete("mar").map { |t| t.countries_names }
 ....
 => ["Malaysia", "Malaysia"] 

The reason it found Malaysia is because of the index containing the following tokens:

[ma], [mal], [mala], [malay], [malays], [malaysi], [malaysia]

Notice I used min_gram of two in order to reduces false positives. Using the same token filter, the search query translates to:

[ma], [mar]

So there’s a match on the “ma”. This can be avoided by using different search analyzer as in the first code snippet. Running the same query gets the expected result:

1.9.3-p327 :163 > Trip.autocomplete("mar").map { |t| t.countries_names }
 => [] 

1.9.3-p327 :164 > Trip.autocomplete("ma").map { |t| t.countries_names }
 ...
 => ["Malaysia", "Malaysia"] 

1.9.3-p327 :167 > Trip.regular_search("ma").map { |t| t.countries_names }
=> [] 

There you go. I hope this will save some of you the time I spent on wiring everything together.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s