Index new meta field and plans for future

In this post we want to introduce new field “updated_time”.
“updated_time” is a string with the time when index was modified formatted according to the ISO 8601 format.

To see metadata you can use methods “GET /v1/indexes” or “GET /v1/index/name” which will return metadata for an index.
This metadata will be in the form of a JSON object and will include the following fields:

{
    "started": true,
    "code": "y4ary7ubus",
    "creation_time": "2012-08-16T11:31:44+01:00",
    "size": "507",
    "updated_time": "2013-04-14T16:26:39+01:00",
    "public_search": false,
    "status": "STARTED"
}

This was a tiny addition to current meta data, but we are in a process of development very cool statistics dashboard.
We are hope that it will help you better understand an effect of search requests and
give you good insights about usage patterns of you indexes.

Big thanks to Easy store hosting for insights!

Fell free to try IndexDen – we are open for everyone!

New tool “Search UI” released

“Search UI” can help to to test queries and see results without writing any code.
The results will be JSON formatted message which contain standard fields like:

  • matches – total number of found documents
  • results – array of documents with relevance score
  • facets (optionally) – if index and results contain categories then facets array will show how many documents contain each category
  • query – current search query
  • search_time – time to find results on server side

Right now it is very limited in functionality – you can test only search query.

In the future plans we want to extend Search testing tool with:

  • good error handling which will give you tips about type of error
  • more search options like fetch fields, generate snippets or use scoring functions
  • better rendering of JSON result string

If you like or dislike this tool please spend 2 minutes and send a feedback.

Special thanks to Yosh Schulman who give us good insight to build this tool!

IndexDen E-commerce cheat sheet

In this tutorial I will collect best practices and most useful information about creating e-commerce application using IndexDen. I will talk about such entities like Products, Categories and Custom fields which are the basic data of any e-commerce application.

Getting started

Download IndexDen API client library:
IndexDen API’s client libraries overview

Documentation and tutorials for most common libraries:
PHP, Python, Ruby, Java, .NET and Rails

Must read before start programming:

This is only first “Getting started!” part of the tutorial. In next part I will describe how to organize data like Products, Categories and Custom fields in indexes to gain maximum of IndexDen abilities.

 

 

How IndexDen supporting languages?

Language support is very important in now days, because we operating globally we have to make sure that any language are supported by IndexDen.
What kind of language support do we provide:
1) Search in Unicode
2) Morphology
3) Phonetic or Soundex support
4) Did you mean feature

This article explain what we did and what we will do to support different languages.

Unicode

There are two most common input methods in use today are single-byte encoding and UTF-8.
IndexDen search are supported UTF-8 encoding, it means it cover all kind of languages.
Including variations of Chinese, Japanese, Korean and Vietnamese languages.

Morphology

Morphology preprocessors can be applied to the words being indexed to replace different forms of the same word with the base, normalized form. For instance, English stemmer will normalize both “dogs” and “dog” to “dog”, making search results for both searches the same.
In the time being IndexDen support only English and Russian stemmers.

But, by custom request we also could provide support for following languages:

  • French
  • Spanish
  • Portuguese
  • Italian
  • German
  • Dutch
  • Swedish
  • Norwegian
  • Danish
  • Finnish
  • Arabic
  • Czech

Phonetic or Soundex support

Soundex is a phonetic algorithm for indexing names by sound, as pronounced in English. The goal is for homophones to be encoded to the same representation so that they can be matched despite minor differences in spelling.

Soundex

We got a lot of requests to support Soundex analyze of words. This is very useful when you need to search by First name, Last name, Street and other genealogical information.
As it is not common for the most of our customers it will be supported only on custom request basis.

Did you mean … ?

Many of our customers requested for “did you mean … ?” functionality from IndexDen.
I think your already know what it is from Google search. In short it is based on comparing the difference between the words in the current query and words from a dictionary.
We already have a plan how to implement it, but it is not easy as it sounds.
This feature also will be enabled by custom requests for our customers.

Conclusion:
Search in Unicode – working for any language including Chinese, Japanese, Korean and Vietnamese languages.
Morphology – by default working for English and Russian language. By request could be enabled for other languages, see the list above.
Phonetic or Soundex support – will be enabled by request.
“Did you mean ..?” feature – will be accessible for all tariff plans except Free plan.

If you want to try and incorporate any of this feature into your application you have to sign up to paid plain.
See plans and pricing which will fit you best.

Feel free to ask any questions or support request.

How Facets works in IndexDen?

What is Facets in search?

Faceted search, also called faceted navigation or faceted browsing, is a technique for accessing information organized according to a faceted classification system, allowing users to explore a collection of information by applying multiple filters.

http://en.wikipedia.org/wiki/Faceted_search

Adding Facets to documents

Documents already added can be categorized. Categories are a way to partition your index for different dimensions. For every category (the dimension) every document can have multiple values.

Each category is defined by string, and its values are also defined by strings. So for instance, you can define a category named “articleType” and its values can be “camera”, “laptop”, etc… You can have another category called “priceRange” and its values can be “$0 to $49″, “$50 to $100″, etc…

$categories = array('priceRange' => '$0 to $299',
                   'articleType' => array('camera', 'electronics'));
$index->update_categories($docid, $categories);

Facets in a search results

When searching, you will get an attribute in the results called “facets”, and it will contain a dictionary with categories for keys. For each category the value will be another map, with category value as key and occurrences as values. So for instance:

{
    'matches': 8,
    'results': [ {'docid': 'doc1'}, ... ],
    'facets': {
        'articleType': {
            'camera': 5,
            'laptop': 3,
	    'electronics': 8
        },
        'priceRange': {
            '$0 to $299': 4,
            '$300 to $599': 4
        }
    }
}

Means that from the matches, 5 are of the “camera” articleType, 3 are “laptop” and 8 are “electronics”. Also, 4 of them all are in the “$0 to $299″ priceRange, and 4 on the “$300 to $599″.

Facets as filters in search query

Then, you can also filter a query by restricting it to a particular set of category/values. For instance the following will only return results that are of the “camera” articleType and also are either in th “$0 to $299″ or “$300 to $599″ price range.

$index.search($query, NULL, NULL, NULL, NULL, NULL,
        array('priceRange' => array('$0 to $299', '$300 to $599'),
              'articleType' => array('camera'))
        );

To see how it works Sing up to IndexDen for Free.

IndexDen for E-commerce

Lately we added several new features to IndexDen (IndexTank API).
The key goal was to integrate e-commerce solutions and IndexDen.

What we have done:

  1. New search options
  2. Improved internal storage for Categories and Variables
  3. Improved clients API (right now only for PHP)

New search options

Category Rollup

First search options is “category_rollup” – comma-separated list of categories. Use category_rollup to generate facets without respect of current filters.

For example.
You have Brands category for cell phones with values: Palm, Samsung and Motorolla.
When user will choose one of the brand i.e. Samsung then facets list will look like :
Samsung(50)

It is good, but id doesn’t say how many entries in the other Categories: Palm and Motorolla.

Now with category_rollup it is possible to see it. Use category_rollup=brand in the search request and you will got following results:
Samsung(50)
Palm(12)
Motorolla (30)

So, even when filter is applied for the Brand you can see facets for other Brands too.

Match any field

Second option “match_any_field” allow to search inside all fields in the index.

For example we have two fields: text and title. Using match_any_field=true IndexDen will search inside both fields.
So, it is more not necessary to specify each field is search query like: title:some query OR text:some query.

 

Improved internal storage for Categories and Variables

We optimized internal storage-search system to support hundreds of Categories and Variables.
It means that you could add hundreds of product “custom fields” and product categories to your product index.

Improved clients API

To support new functions we also improved existing PHP API. We added support for new parameters to the search function.
You can check it out: IndexTank PHP API

We are inviting Heroku users on alpha test

We are testing our full-text search API add-on for Heroku and would like to request some testers. If you are interested in helping us out, please contact me at indexden@indexden.com and I will send out an invite. We have a couple spots left that we’d like to fill.

Thanks in advance for anyone who can help us out! We’d love to hear feedback.

How to find related or duplicate items with IndexDen

Many applications and websites faced with the challenge of finding “related items” like:

  • Related articles in a blog
  • Related products in a shop
  • And so on

How to determine related item?
In the comparison of text object – related items determined based on the percentage of text similarity. For instance if text A similar more than on 80% to text B then both text object are related to each other.

How IndexDen can help?

Recently IndexDen added new feature to the API – quorum operator. With quorum operator you could match those documents that pass a given threshold of given words.
For example: search query like “the world is a wonderful place”/3 will match all documents that have at least 3 of the 6 specified words.

Ok, lets try real example with IndexDen.

Continue reading