TitanDB Cassandra ElasticSearch Indexing

In this tutorial we show how to setup a Maven java project using TitanDB. TitanDB is a distributed graph database that can be deployed to various backend storages like Apache Cassandra, Apache HBase, Oracle BerkeleyDB or DynamoDB within AWS.

We use Apache Cassandra and ElasticSearch to support fulltext search. We work with the Graph of the Gods to query and write data using Tinkerpop3 Gremlin.

This tutorial is based on and inspired by the following repositories:

https://github.com/thinkaurelius/titan/

https://github.com/pluradj/titan-tp3-java-example

You can download the example code for this tutorial on github.

1. Requirements

The provided code example is a Maven project. All the required dependencies are listed in the pom.xml file. Apache Cassandra must be downloaded and started separately. TitanDB, Cassandra and ElasticSearch are started locally (no cluster, sharding etc.). We use:

  • Titan 1.0.0
  • Cassandra 2.1.16
  • ElasticSearch 1.5.1
  • Apache Tinkerpop3

2. Installation

You can download the repository here and generate a maven project using the command/shell line or an IDE like Eclipse. Get Apache Cassandra here. Unzip it to a folder of your choice. We refer to the deployment path as CASSANDRA_HOME.

3. Configure Cassandra

We have to activate the Cassandra Thrift interface. The purpose of using Thrift in Cassandra is to allow portable access to the database using different programming languages with a binary format.

Open CASSANDRA_HOME/conf/cassandra.yaml. Search for start_rpc and set the value to true.

Thats all the configuration we need. You can start Cassandra now using the .bat or .sh file in the bin folder:

CASSANDRA_HOME/bin/cassandra.bat

We tested Cassandra 2.2.8 and 3.9 as well. You can make it work with 2.2.8 but you need more adaptations which will be covered in a different tutorial. We could not run on our configuration on Cassandra 3.x so far.

4. Configuring the TitanDB

In the git repository you find some configuration files in the src/main/resources folder. The property file titan-cassandra-es.properties is sufficient to configure titan with Cassandra and ElasticSearch:

If you have a remote instance of Cassandra please change the storage.hostname to the ip where Cassandra is running (you have to add a port parameter if you run Cassandra with a different port configuration).

If you do not work with the example code, you can do the same for ElasticSearch if you run it remotly:

5. Start TitanDB

In order to start up TitanDB and ElasticSearch you can run the TitanStart.java class:

Some explanation:

  1. Load the titan configuration and start ElasticSearch
  2. Open a titan graph and load the Graph of the Gods
  3. Create an index on the name property and create a multi index on the name and age property
  4. Wait until the indexing is finished
  5. Create and insert a new vertex
  6. Reindex the graph (just for demo purpose, the index is updated automatically). You should delete this statement.
  7. Use a graph traversal source and Gremlin pipeline to query data from the graph
    • Query the “theOneAndOnly” vertex we inserted in step 5
    • Query “hercules” from the Graph of the Gods and retrieve his name, age and outgoing edges
  8. Finally close the graph and the ElasticSearch node

The console output should look something like this (omitted previous logs)

The start up sequence for TitanDB and especially ElasticSearch can take up a while. This example runs locally and sets up TitanDB and ElasticSearch every time. In general TitanDB should be used as a distributed graph cluster and not in a local environment. This is just a training example to get used to work with TitanDB and Gremlin in general.

If you have errors, exceptions or other problems feel free to comment and ask.

Facebooktwittergoogle_plusredditpinterestlinkedinmail

Leave a Reply