Using a Secondary Index

Ideal for routine queries, secondary indexes use MapReduce to build indexes over large amounts of data.
If you learn better by seeing a demonstration, watch these videos first:

Provision the IBM Cloudant Service in Bluemix

If you do not already have the IBM Cloudant service provisioned in Bluemix, follow these steps to provision the service. ▼More

Replicate the sample database

You'll be working with a sample database in this tutorial. Follow these steps to replicate the sample datatabase. ▼More

Write a secondary index

Before we dive into using the API, let's first take a look at how to define a secondary index using MapReduce. ▼More

Secondary indexes, or views, are defined in a map function, which pulls out data from your documents and an optional reduce function that aggregates the data emitted by the map.

These functions are written in JavaScript and held in "design documents", special documents that the database knows contain these - and other - functions. We'll go into more detail about design documents in another tutorial, for now we'll just think of them as documents that define our secondary indexes.

A sample design document with MapReduce functions

{
  "_id": "_design/name",
  "views": {
    "view1": {
      "map":"function(doc){emit(doc.field, 1)}",
      "reduce": "function(key, value, rereduce){return sum(values)}"
    }
  }
}

The naming convention for design documents is such that the name follows _design/ in the _id. This code defines view1 for the design document name. Design documents can contain multiple views; each is added to the views object.

map functions are required for a view, a reduce is optional.

A sample Cloudant API call

Here's what an API call to this sample function would look like, where [username] is your username and [db_name] is the name of your database:

https://[username].cloudant.com/[db_name]/_design/name/_view/view1

Review Map functions

As you probably saw in the primary index tutorial, our small sample database is filled with animals. Our first map function will render the diets of the animals in the database. ▼More

This index emits the animals diet as the key, and one as the value.

function(doc) {
  if(doc.diet){
    emit(doc.diet, 1);
  }
}

Query: https://[username].cloudant.com/animaldb/_design/views101/_view/diet

Complex keys

A view's key can be any valid JSON data structure. We'll cover why this is particularly useful in the API section below, for now it's useful to know that lists and dictionaries can be emitted and that they will sort after numbers and strings.

This index emits the class and diet as a complex key, and one as the value.

function(doc){
  if(doc.class && doc.diet){
    emit([doc.class, doc.diet], 1)
  }
}

Query: https://[username].cloudant.com/animaldb/_design/views101/_view/complex_count?reduce=false

Review Reduce functions

Reduces are where things get really interesting. ▼More

Lets say we wanted to sum up all the values the map function emitted, that operation would be done in the reduce function. Reduces are called with three parameters; key, values and rereduce. keys will be a list of keys as emitted by the map or null, values will be a list of values for each element in keys, and rereduce will be true or false.

The map emits the animals diet as the key, and one as the value.

function(doc) {
  if(doc.diet){
    emit(doc.diet, 1);
  }
}

A simplistic reduce function. This reduce function should return the number of rows but it is broken, can you see how?

function (keys, values, rereduce){
  return values.length;
}

Query: https://[username].cloudant.com/animaldb/_design/views101/_view/diet_jscount

There may be cases where you want only the results of the map function, even though you've added a reduce function to your view. (I.e., you don't want a reduced result.) You don't need to write another view for that. Add reduce=false to the query to turn off the reduce function. (Try it, above.)

ReReduce

One common source of confusion when writing a reduce function is dealing with the rereduce=true case. When the view is built the database arbitrarily divides the documents up into batches to process. It then merges these batches up to form the complete view result. It is when the database does this merging that it calls the reduce function with rereduce=true. This means the database calls the function with output from an intermediate run of the reduce function.

You need to be careful when writing reduce functions that you take the rereduce case into account correctly. The example above didn't take this into account which is why it is broken. Well done if you spotted that! Lets look at the code in more detail:

function (keys, values, rereduce) {
  return values.length;
}

When rereduce=false the reduce function might be called with:

keys: [[key1, idA], [key1, idB], [key1, idC], [key2, idA], [key2, idD], [key3, idA]
values: [key1value1, key1value2. key1value3, key2value1, key2value2, key3value1]

The above function would correctly return 6 (the length of the values array).

In the rereduce=true case the function will get called with an array of counts from previous invocations:

keys: null
values: [6, 3, 7]

and will return 3, which is not the correct count; it should be 6 + 3 + 7 = 16.

The function above would be reasonable for the rereduce=false invocation but incorrect when it's true. The reduce function needs to explicitly take into account the times it is called with the result of a previous reduce:

function (keys, values, rereduce) {
  if (rereduce){
    // Get an array of counts, count == sum
    return sum(values);
  } else {
    // Get a list of values, count == length
    return values.length;
  }
}

You'll get the same result in the rereduce=false case but in the rereduce=true case you'll correctly return the sum of the values.

Built-in reduces

While you can define your own reduce functions, it's often the case that your reduce is going to be doing a simple count or sum operation. There are a handful of built in reduce functions; _sum, _count and _stats. If you can use these functions you should - they're faster than a javascript reduce (since they avoid serialisation between erlang and javascript) and are very well tested.

_sum - Produces the sum of all values for a key, values must be numeric

_count - Produces the row count for a given key, values can be any valid json

_stats - Produces a json structure containing sum, count, min, max and sum squared, values must be numeric

To use a built-in reduce, just put its name in place of the javascript reduce function inside your view.

This map emits the animals diet as the key, an the animals latin name as the value.

function(doc) {
  if(doc.diet && doc.latin_name){
    emit(doc.diet, doc.latin_name);
  }
}

This built-in reduce counts the number of rows emitted by the map function. The rows can have any value, unlike _sum which requires the value be a number.

_count

Query: https://[username].cloudant.com/animaldb/_design/views101/_view/diet_count

Review the API Options

Secondary indexes have the same API options as the primary index, so you can limit, skip, slice, include_docs, and query for a specific key. ▼More

limit & skip

This map function emits the Latin name as the key, and the length of that name as the value.

function(doc) {
  if(doc.latin_name){
    emit(doc.latin_name, doc.latin_name.length);
  }
}

This API call will limit the results to 2, and skip over the first 3.

Query: https://[username].cloudant.com/animaldb/_design/views101/_view/latin_name?limit=2&skip=3

stale=ok

This code emits the Latin name as the key, and the length of that name as the value.

function(doc) {
  if(doc.latin_name){
    emit(doc.latin_name, doc.latin_name.length);
  }
}

Pass the stale=ok parameter to indicate that you'd rather have low latency responses than a completely up-to-date index. Omitting this parameter from your queries means that there may be times where you or your users will have to wait for the indexing to be complete.

Because we regularly update your views for you, most developers building user-facing applications on Cloudant choose the stale=ok parameter for best, low-latency performance.

Query: https://[username].cloudant.com/animaldb/_design/views101/_view/latin_name?stale=ok

reduce=false

If a reduce function is defined for a view that function will have been applied to the view result. As already mentioned you can query a view without the reduce step by passing in ?reduce=false in the query.

map emits the animals diet as the key and the Latin name as the value.

function(doc) {
  if(doc.diet && doc.latin_name){
    emit(doc.diet, doc.latin_name);
  }
}

This built-in reduce counts the number of rows emitted by the map function but is disabled by querying the view with ?reduce=false.

_count

Query: https://[username].cloudant.com/animaldb/_design/views101/_view/diet_count?reduce=false

group=true

In the reduce examples above, the group parameter was omitted, which generated results over all keys. If you want to return results per key, use group=true. group=true is an invalid for a map-only or reduce=false view, you will get an error if you try to group a non-reduced view.

map emits the animals diet as the key and the Latin name as the value.

function(doc) {
  if(doc.diet && doc.latin_name){
    emit(doc.diet, doc.latin_name);
  }
}

This built-in reduce counts the number of rows emitted by the map function.

_count

Query: https://[username].cloudant.com/animaldb/_design/views101/_view/diet_count?group=true

group-level

If you have a complex key you can query that key at a different group_level. This means the reduce function can be returned at different granularities. This is very powerful for reporting data over time series; the same view can be used to answer queries about yearly activity or per second activity. If you query with group_level equal to or higher than the length of your key (i.e., the number of values in your complex key) you will get the same response as querying with group=true. Key lengths do not need to match.

function(doc){
  if(doc.latin_name){
    emit([doc.class, doc.diet, doc.latin_name], doc.latin_name.length)
  }
}

This built-in reduce counts the number of rows emitted by the map function.

_count

Query: https://[username].cloudant.com/animaldb/_design/views101/_view/complex_latin_name_count?group_level=3

Try changing the group level in the URL above, you should initially see results for all levels of the key (it's queried with group_level=3), but if you change that to group_level=2 or group_level=1 you should see the number of animals who match the key at that group level.

Views provide a powerful way to inspect your data, beyond basic key:value look ups and range queries over _all_docs. Building these secondary indexes incrementally allows for rapid analysis of your data as it streams into the database.

While views are ideal for routine queries they are not well suited to ad hoc inspection of the data. For this Cloudant has developed a search tool allowing for complex, ad-hoc queries over your dataset.

Find more videos and tutorials in the Learning Center.