The Ultimate Search Tool
DSet is a search tool for the board game Dominion. It allows players to filter sets of cards via chainable, partial queries to allow users to generate a set of cards that fit all their supplied criteria.
Originally conceived as a randomizer to speed up setup of a game session, my recent efforts have been focused on refining the core search functionality. The following are the changes I have made since my last post on this project.
Tools
Rails, Rspec, jQuery, AJAX, Arel, Bootstrap, Atom, Ubuntu
Overview
DSet is designed to use a minimalist, single-input field to accept user queries. It combines the functionality of a fuzzy search, chainable search, multi-category search, and live search into a single system.
Fuzzy Search
Each query is pattern matched against strings in eligible fields.
Example: “Vlg” matches with “Village”
Chainable search
Each space-separated query is treated as a chained filter, with each additional subquery further refining the result set.
Example: “v 3” will return a match of “Village” (“v” found in the name, with “3” as its cost).
Multi-Category Search
When results are presented to the user, they are grouped by the “categories” (ie database fields) on which they were matched. In other words, for any given query, it is compared against any possible matches across all indexable record fields.
Example: A query of “v 3” would return an array of results containing hashes like this:
["Name", "Terminality", "Cost"]=>[["Village", {:card=>#<Card id: 5, name: "Village", image_url: "http://wiki.dominions
trategy.com/images/thumb/5/5a...", cost: 3, created_at: "2018-03-23 02:08:14", updated_at: "2018-03-23 02:08:14">, :columns=>["Name", "Terminality", "Cost"], :terms=>["Village", "Splitter / Village", "3"]}]]
Each result is assigned an array as a key matching the columns on which it was matched (for multi-category display), followed by values for the card record, matched columns array, and term array (ie if the “column” is the key, the “term” is it’s value).
Process
Continuing from my previous work, I made some updates to lift the limitations of my previous solution while cleaning up the code to make it more readable.
Database Update
In earlier versions, DSet had a single “Card” model that contained all data corresponding to a particular card. In my newer version, I split this data into a “Descriptor” model, which not only helped with lookup but allowed for metadata about each descriptor to be included within each descriptor instance. As a result, the seed.rb
file was refactored to create Card instances alongside the appropriate Descriptors like so:
card = Card.create!(name: "Cellar",
image_url: base_url + "1/1c/Cellar.jpg/200px-Cellar.jpg",
cost: 2)
assign_descriptors(%w(Base Action Sifter Non-Terminal), card)
...
def assign_descriptors(name, card)
...
case name.downcase
...
when 'Sifter'.downcase
Descriptor.create!({
name: 'Sifter',
category: 'Archetype',
description: 'Enables players to cycle through junk cards in their deck faster.',
card: card
})
Actualizing Full Search System Functionality
By far the most time-consuming aspect of this project has been tweaking the card search algorithm to behave as I intended. After many different refactor attempts, I eventually landed on the following code for the top-level method call:
def self.search(queries_string, slot)
return [] if queries_string.blank?
query_array = queries_string.to_s.split
formatted_queries = format_multi_char_queries(query_array)
matches = get_matches(formatted_queries)
matches = matches.select{|_,v| v[:terms].length >= query_array.length}
matches.group_by { |_, v| v[:columns] }
end
Starting with the search function itself, I broke down each piece into separate functions for readability. In brief, the search method:
- Splits the user’s query into subqueries delineated by a space.
- Formats each query by inserting the wildcard operator between each character in each subquery string.
- The formatted query array is passed to
get_matches
to perform the search against each subquery. - Matches that do not match against ALL supplied queries are discarded.
- Matches are grouped by columns to format output to be displayed to the user.
Getting #4 took the most time to get right. A frequent issue with my refactors was that my search function would return too few or too many results. In the latter case, it would return results that matched against one supplied query, but not against the rest (when it should only have results against all chained queries).
The Search System Itself
Given the complexity of the code, I will discuss it one piece at a time.
private_class_method def self.get_matches(subqueries)
match_data = {}
subqueries.each do |query|
result = get_card_subset(query, match_data)
match_data = result.nil? ? match_data : result
end
match_data
end
At this level, each subquery is evaluated sequentially and returns a result hash containing matching cards. With each iteration, match_data
is passed back into get_card_subset
to limit subsequent searches to only search within the previous result set.
For the next part, I will use in-line annotation for readability.
private_class_method def self.get_card_subset(query, match_data)
new_match_data = {}
# If match data from a previous iteration is not available, test against all cards in the database
if(match_data.empty?)
card_set = Card.all
else
card_set = Card.where(name: match_data.keys)
end
# If the supplied subquery is numeric, test it against the "Cost" column
if numeric?(query)
matched_cards = card_set.where(cost: query)
# For each match found, concatenate the results into the existing result hash (new_match_data). Explained further below.
matched_cards.each do |card|
merge_match_data(match_data, new_match_data, card, 'Cost', query)
end
else
# If the supplied subquery is a string, test against the "name" column
# of the Card model via fuzzy matching
matched_cards = card_set.where('name ~* :pat', pat: query).distinct
# Concatenate and format results into existing results
matched_cards.each do |card|
merge_match_data(match_data, new_match_data, card, 'Name', card[:name])
end
# This line allows for both Card name matches and Descriptor matches
# to be returned within the same result hash
match_data = (new_match_data.any? ? new_match_data : match_data)
# Retrieves all Descriptor instances associated with a card within the current card_set (ie not filtered out by earlier subqueries)
descriptor_set = Descriptor.where(Descriptor.arel_table[:card_id].in card_set.pluck(:id) ).distinct
# Fuzzy match the subquery against the names from the filtered array of
# Descriptor instances
descriptor_matches = descriptor_set.where('name ~* :pat', pat: query).distinct
# Concatenate and format results into existing results
descriptor_matches.each do |kw|
merge_match_data(match_data, new_match_data, kw.card, kw.category, kw.name)
end
end
return new_match_data
end
The next two functions are a little less heavy in terms of complexity.
private_class_method def self.merge_match_data(match_data, new_match_data, card, column, term)
new_match_data[card.name] = new_match_data[card.name].nil? ? new_match_data[card.name] : {}
new_match_data[card.name][:card] = card
new_match_data[card.name][:columns] = merge_result_hash(match_data, card.name, :columns, column)
new_match_data[card.name][:terms] = merge_result_hash(match_data, card.name, :terms, term)
end
This method creates result hashes, or concatenates new data into existing result hash elements. It generates a hash containing the card (as an ActiveRecord object), columns (ie what descriptor of the model was matched, such as name, cost, etc), and terms (the actual word matched against the supplied subquery).
private_class_method def self.merge_result_hash(hsh, key, sub_key, new_elem)
result = {}
if hsh[key].nil?
result = [new_elem]
else
if hsh[key][sub_key].nil?
result = [new_elem]
else
result = (hsh[key][sub_key].include?(new_elem) ?
hsh[key][sub_key] : hsh[key][sub_key].push(new_elem))
end
end
result
end
Finally, merge_result_hash
concatenates new term or category matches into the result hash, omitting duplicates. The key
refers to the name of the card stored as a key in the result hash, and the subkey
refers either to the column
or term
key of the nested result hash element.
Example: Given this result hash:
# Before
result_hsh = {
'Village':{
columns: ['Name'],
terms: ['Village']
}
}
...
# Methods invoked
merge_result_hash(result_hsh, "Village", :columns, "Cost")
merge_result_hash(result_hsh, "Village", :terms, 3)
...
# After
result_hsh = {
'Village':{
columns: ['Name', 'Cost'],
terms: ['Village', 3]
}
}
Next Steps
I am currently toying with porting this search system for use as a product lookup system for a local hardware store. My goal is to convert this over into a reusable gem to be shared with the open source community.