TitanDB Cassandra ElasticSearch Indexing

In this tutorial we show how to setup a Maven java project using TitanDB. TitanDB is a distributed graph database that can be deployed to various backend storages like Apache Cassandra, Apache HBase, Oracle BerkeleyDB or DynamoDB within AWS.

We use Apache Cassandra and ElasticSearch to support fulltext search. We work with the Graph of the Gods to query and write data using Tinkerpop3 Gremlin.

This tutorial is based on and inspired by the following repositories:

https://github.com/thinkaurelius/titan/

https://github.com/pluradj/titan-tp3-java-example

You can download the example code for this tutorial on github.

1. Requirements

The provided code example is a Maven project. All the required dependencies are listed in the pom.xml file. Apache Cassandra must be downloaded and started separately. TitanDB, Cassandra and ElasticSearch are started locally (no cluster, sharding etc.). We use:

  • Titan 1.0.0
  • Cassandra 2.1.16
  • ElasticSearch 1.5.1
  • Apache Tinkerpop3

2. Installation

You can download the repository here and generate a maven project using the command/shell line or an IDE like Eclipse. Get Apache Cassandra here. Unzip it to a folder of your choice. We refer to the deployment path as CASSANDRA_HOME.

3. Configure Cassandra

We have to activate the Cassandra Thrift interface. The purpose of using Thrift in Cassandra is to allow portable access to the database using different programming languages with a binary format.

Open CASSANDRA_HOME/conf/cassandra.yaml. Search for start_rpc and set the value to true.

[...]
# Whether to start the thrift rpc server.
start_rpc: true

# The address or interface to bind the Thrift RPC service and native transport
# server to.
[...]

Thats all the configuration we need. You can start Cassandra now using the .bat or .sh file in the bin folder:

CASSANDRA_HOME/bin/cassandra.bat

We tested Cassandra 2.2.8 and 3.9 as well. You can make it work with 2.2.8 but you need more adaptations which will be covered in a different tutorial. We could not run on our configuration on Cassandra 3.x so far.

4. Configuring the TitanDB

In the git repository you find some configuration files in the src/main/resources folder. The property file titan-cassandra-es.properties is sufficient to configure titan with Cassandra and ElasticSearch:

storage.backend=cassandrathrift
storage.hostname=localhost
cache.db-cache=false
#cache.db-cache-clean-wait = 20
#cache.db-cache-time = 1000
#cache.db-cache-size = 0.25

index.search.backend=elasticsearch
index.search.hostname=localhost
index.search.elasticsearch.client-only=true

If you have a remote instance of Cassandra please change the storage.hostname to the ip where Cassandra is running (you have to add a port parameter if you run Cassandra with a different port configuration).

If you do not work with the example code, you can do the same for ElasticSearch if you run it remotly:

[...]
# Set both 'bind_host' and 'publish_host':
#
# network.host: 192.168.0.1
network.host: 127.0.0.1

# Set a custom port for the node to node communication (9300 by default):
#
# transport.tcp.port: 9300
[...]

5. Start TitanDB

In order to start up TitanDB and ElasticSearch you can run the TitanStart.java class:

package com.tutorialacademy.ds.titan;

import java.util.Iterator;

import org.apache.commons.configuration.Configuration;
import org.apache.commons.configuration.PropertiesConfiguration;
import org.apache.tinkerpop.gremlin.process.traversal.dsl.graph.GraphTraversal;
import org.apache.tinkerpop.gremlin.process.traversal.dsl.graph.GraphTraversalSource;
import org.apache.tinkerpop.gremlin.structure.Direction;
import org.apache.tinkerpop.gremlin.structure.Edge;
import org.apache.tinkerpop.gremlin.structure.Vertex;
import org.elasticsearch.node.Node;
import org.elasticsearch.node.NodeBuilder;

import com.thinkaurelius.titan.core.PropertyKey;
import com.thinkaurelius.titan.core.TitanFactory;
import com.thinkaurelius.titan.core.TitanGraph;
import com.thinkaurelius.titan.core.schema.SchemaAction;
import com.thinkaurelius.titan.core.schema.TitanManagement;
import com.thinkaurelius.titan.core.util.TitanCleanup;
import com.thinkaurelius.titan.example.GraphOfTheGodsFactory;
import com.thinkaurelius.titan.graphdb.database.management.ManagementSystem;
import com.tinkerpop.gremlin.java.GremlinPipeline;

public class TitanStart {

	private static String getRelativeResourcePath( String resource ) {
		return TitanStart.class.getClassLoader().getResource(resource).getPath();
	}
	
	public static void main(String[] args) throws Exception {

		Configuration conf = new PropertiesConfiguration( getRelativeResourcePath( "titan-cassandra-es.properties" ) );
		// start elastic search on startup
		Node node = new NodeBuilder().node();
		
		TitanGraph graph = TitanFactory.open(conf);
		/* Comment if you do not want to reload the graph every time */
		graph.close();
		TitanCleanup.clear(graph);
		graph = TitanFactory.open(conf);
		GraphOfTheGodsFactory.load(graph);
		/* graph loaded  */
		
		// create own indexes
		TitanManagement mgmt = graph.openManagement();
		PropertyKey name = mgmt.getPropertyKey("name");
		PropertyKey age = mgmt.getPropertyKey("age");
		mgmt.buildIndex( "byNameComposite", Vertex.class ).addKey(name).buildCompositeIndex();
		// index consisting of multiple properties
		mgmt.buildIndex( "byNameAndAgeComposite", Vertex.class ).addKey(name).addKey(age).buildCompositeIndex();
		mgmt.commit();
		
		// wait for the index to become available
		ManagementSystem.awaitGraphIndexStatus(graph, "byNameComposite").call();
		ManagementSystem.awaitGraphIndexStatus(graph, "byNameAndAgeComposite").call();
		
		// create new vertex
		Vertex me = graph.addVertex("theOneAndOnly");
		me.property( "name", "me" );
		me.property( "age", 1 );
		graph.tx().commit();
		System.out.println("Created the one and only!");
		
		// re index the existing data (not required, just for demo purposes)
		mgmt = graph.openManagement();
		mgmt.updateIndex( mgmt.getGraphIndex("byNameComposite"), SchemaAction.REINDEX ).get();
		mgmt.updateIndex( mgmt.getGraphIndex("byNameAndAgeComposite"), SchemaAction.REINDEX ).get();
		mgmt.commit();
		
		GraphTraversalSource g = graph.traversal();
		GremlinPipeline<GraphTraversal<?, ?>, ?> pipe = new GremlinPipeline();
		
		// read our new vertex
		pipe.start( g.V().has( "name", "me" ) );	
		Vertex v = (Vertex)pipe.next();
		System.out.println();
		System.out.println( "Label: " + v.label() );
		System.out.println( "Name: " + v.property("name").value() );
		System.out.println( "Age: " + v.property("age").value() );
		System.out.println();
		
		// read different vertex
		pipe.start( g.V().has( "name", "hercules" ) );	
		Vertex herclues = (Vertex)pipe.next();
		System.out.println( "Label: " + herclues.label() );
		System.out.println( "Name: " + herclues.property("name").value() );
		System.out.println( "Age: " + herclues.property("age").value() );
		
		// print some edges
		Iterator<Edge> it = herclues.edges( Direction.OUT );
		while( it.hasNext() ) {
			Edge e = it.next();
			System.out.println( "Out: " + e.label()  + " --> " + e.inVertex().property("name").value() );
		}
		System.out.println();
		
		// close graph
		graph.close();
		// close elastic search on shutdown
		node.close();
		
		System.exit(0);
	}
}

Some explanation:

  1. Load the titan configuration and start ElasticSearch
  2. Open a titan graph and load the Graph of the Gods
  3. Create an index on the name property and create a multi index on the name and age property
  4. Wait until the indexing is finished
  5. Create and insert a new vertex
  6. Reindex the graph (just for demo purpose, the index is updated automatically). You should delete this statement.
  7. Use a graph traversal source and Gremlin pipeline to query data from the graph
    • Query the “theOneAndOnly” vertex we inserted in step 5
    • Query “hercules” from the Graph of the Gods and retrieve his name, age and outgoing edges
  8. Finally close the graph and the ElasticSearch node

The console output should look something like this (omitted previous logs)

14:31:04,577  INFO metadata:566 - [Sprite] [titan] create_mapping [vertices]
14:31:04,622  INFO metadata:566 - [Sprite] [titan] create_mapping [edges]
14:31:04,643  INFO metadata:558 - [Sprite] [titan] update_mapping [edges]
14:31:10,425  INFO GraphIndexStatusWatcher:67 - Some key(s) on index byNameComposite do not currently have status REGISTERED: name=INSTALLED
14:31:10,926  INFO GraphIndexStatusWatcher:67 - Some key(s) on index byNameComposite do not currently have status REGISTERED: name=INSTALLED
14:31:11,428  INFO GraphIndexStatusWatcher:67 - Some key(s) on index byNameComposite do not currently have status REGISTERED: name=INSTALLED
14:31:11,929  INFO GraphIndexStatusWatcher:67 - Some key(s) on index byNameComposite do not currently have status REGISTERED: name=INSTALLED
14:31:12,431  INFO GraphIndexStatusWatcher:67 - Some key(s) on index byNameComposite do not currently have status REGISTERED: name=INSTALLED
14:31:12,934  INFO GraphIndexStatusWatcher:67 - Some key(s) on index byNameComposite do not currently have status REGISTERED: name=INSTALLED
14:31:13,435  INFO GraphIndexStatusWatcher:67 - Some key(s) on index byNameComposite do not currently have status REGISTERED: name=INSTALLED
14:31:13,570  INFO ManagementLogger:138 - Received all acknowledgements for eviction [1]
14:31:13,952  INFO GraphIndexStatusWatcher:67 - Some key(s) on index byNameComposite do not currently have status REGISTERED: name=INSTALLED
14:31:14,455  INFO GraphIndexStatusWatcher:67 - Some key(s) on index byNameComposite do not currently have status REGISTERED: name=INSTALLED
14:31:14,957  INFO GraphIndexStatusWatcher:67 - Some key(s) on index byNameComposite do not currently have status REGISTERED: name=INSTALLED
14:31:15,458  INFO GraphIndexStatusWatcher:67 - Some key(s) on index byNameComposite do not currently have status REGISTERED: name=INSTALLED
14:31:15,959  INFO GraphIndexStatusWatcher:67 - Some key(s) on index byNameComposite do not currently have status REGISTERED: name=INSTALLED
14:31:16,462  INFO GraphIndexStatusWatcher:67 - Some key(s) on index byNameComposite do not currently have status REGISTERED: name=INSTALLED
14:31:16,962  INFO GraphIndexStatusWatcher:67 - Some key(s) on index byNameComposite do not currently have status REGISTERED: name=INSTALLED
14:31:17,464  INFO GraphIndexStatusWatcher:67 - Some key(s) on index byNameComposite do not currently have status REGISTERED: name=INSTALLED
14:31:17,966  INFO GraphIndexStatusWatcher:67 - Some key(s) on index byNameComposite do not currently have status REGISTERED: name=INSTALLED
14:31:18,467  INFO GraphIndexStatusWatcher:67 - Some key(s) on index byNameComposite do not currently have status REGISTERED: name=INSTALLED
14:31:18,579  INFO ManagementSystem$UpdateStatusTrigger:834 - Set status REGISTERED on schema element byNameComposite with property keys []
14:31:18,696  INFO ManagementSystem$UpdateStatusTrigger:834 - Set status REGISTERED on schema element byNameAndAgeComposite with property keys []
14:31:18,814  INFO ManagementLogger:138 - Received all acknowledgements for eviction [2]
14:31:18,982  INFO GraphIndexStatusWatcher:69 - All 1 key(s) on index byNameComposite have status REGISTERED
14:31:19,002  INFO GraphIndexStatusWatcher:69 - All 2 key(s) on index byNameAndAgeComposite have status REGISTERED
Created the one and only!
14:31:19,315  INFO IndexRepairJob:101 - Found index byNameComposite
14:31:19,328  INFO IndexRepairJob:101 - Found index byNameComposite
14:31:19,662  INFO ManagementSystem:994 - Index update job successful for [byNameComposite]
14:31:19,665  INFO IndexRepairJob:101 - Found index byNameAndAgeComposite
14:31:19,671  INFO IndexRepairJob:101 - Found index byNameAndAgeComposite
14:31:19,935  INFO ManagementSystem:994 - Index update job successful for [byNameAndAgeComposite]

Label: theOneAndOnly
Name: me
Age: 1

Label: demigod
Name: hercules
Age: 30
Out: father --> jupiter
Out: mother --> alcmene
Out: battled --> hydra
Out: battled --> nemean
Out: battled --> cerberus

14:31:20,061  INFO CassandraThriftStoreManager:612 - Closed Thrift connection pooler.
14:31:20,071  INFO node:278 - [Sprite] stopping ...
14:31:20,413  INFO node:311 - [Sprite] stopped
14:31:20,413  INFO node:328 - [Sprite] closing ...
14:31:20,426  INFO node:412 - [Sprite] closed

The start up sequence for TitanDB and especially ElasticSearch can take up a while. This example runs locally and sets up TitanDB and ElasticSearch every time. In general TitanDB should be used as a distributed graph cluster and not in a local environment. This is just a training example to get used to work with TitanDB and Gremlin in general.

If you have errors, exceptions or other problems feel free to comment and ask.

Facebooktwitterredditpinterestlinkedinmail

Related posts

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.