In this tutorial we show how to setup a Maven java project using TitanDB. TitanDB is a distributed graph database that can be deployed to various backend storages like Apache Cassandra, Apache HBase, Oracle BerkeleyDB or DynamoDB within AWS.
We use Apache Cassandra and ElasticSearch to support fulltext search. We work with the Graph of the Gods to query and write data using Tinkerpop3 Gremlin.
This tutorial is based on and inspired by the following repositories:
https://github.com/thinkaurelius/titan/
https://github.com/pluradj/titan-tp3-java-example
You can download the example code for this tutorial on github.
1. Requirements
The provided code example is a Maven project. All the required dependencies are listed in the pom.xml file. Apache Cassandra must be downloaded and started separately. TitanDB, Cassandra and ElasticSearch are started locally (no cluster, sharding etc.). We use:
- Titan 1.0.0
- Cassandra 2.1.16
- ElasticSearch 1.5.1
- Apache Tinkerpop3
2. Installation
You can download the repository here and generate a maven project using the command/shell line or an IDE like Eclipse. Get Apache Cassandra here. Unzip it to a folder of your choice. We refer to the deployment path as CASSANDRA_HOME.
3. Configure Cassandra
We have to activate the Cassandra Thrift interface. The purpose of using Thrift in Cassandra is to allow portable access to the database using different programming languages with a binary format.
Open CASSANDRA_HOME/conf/cassandra.yaml. Search for start_rpc and set the value to true.
[...] # Whether to start the thrift rpc server. start_rpc: true # The address or interface to bind the Thrift RPC service and native transport # server to. [...]
Thats all the configuration we need. You can start Cassandra now using the .bat or .sh file in the bin folder:
CASSANDRA_HOME/bin/cassandra.bat
We tested Cassandra 2.2.8 and 3.9 as well. You can make it work with 2.2.8 but you need more adaptations which will be covered in a different tutorial. We could not run on our configuration on Cassandra 3.x so far.
4. Configuring the TitanDB
In the git repository you find some configuration files in the src/main/resources folder. The property file titan-cassandra-es.properties is sufficient to configure titan with Cassandra and ElasticSearch:
storage.backend=cassandrathrift storage.hostname=localhost cache.db-cache=false #cache.db-cache-clean-wait = 20 #cache.db-cache-time = 1000 #cache.db-cache-size = 0.25 index.search.backend=elasticsearch index.search.hostname=localhost index.search.elasticsearch.client-only=true
If you have a remote instance of Cassandra please change the storage.hostname to the ip where Cassandra is running (you have to add a port parameter if you run Cassandra with a different port configuration).
If you do not work with the example code, you can do the same for ElasticSearch if you run it remotly:
[...] # Set both 'bind_host' and 'publish_host': # # network.host: 192.168.0.1 network.host: 127.0.0.1 # Set a custom port for the node to node communication (9300 by default): # # transport.tcp.port: 9300 [...]
5. Start TitanDB
In order to start up TitanDB and ElasticSearch you can run the TitanStart.java class:
package com.tutorialacademy.ds.titan; import java.util.Iterator; import org.apache.commons.configuration.Configuration; import org.apache.commons.configuration.PropertiesConfiguration; import org.apache.tinkerpop.gremlin.process.traversal.dsl.graph.GraphTraversal; import org.apache.tinkerpop.gremlin.process.traversal.dsl.graph.GraphTraversalSource; import org.apache.tinkerpop.gremlin.structure.Direction; import org.apache.tinkerpop.gremlin.structure.Edge; import org.apache.tinkerpop.gremlin.structure.Vertex; import org.elasticsearch.node.Node; import org.elasticsearch.node.NodeBuilder; import com.thinkaurelius.titan.core.PropertyKey; import com.thinkaurelius.titan.core.TitanFactory; import com.thinkaurelius.titan.core.TitanGraph; import com.thinkaurelius.titan.core.schema.SchemaAction; import com.thinkaurelius.titan.core.schema.TitanManagement; import com.thinkaurelius.titan.core.util.TitanCleanup; import com.thinkaurelius.titan.example.GraphOfTheGodsFactory; import com.thinkaurelius.titan.graphdb.database.management.ManagementSystem; import com.tinkerpop.gremlin.java.GremlinPipeline; public class TitanStart { private static String getRelativeResourcePath( String resource ) { return TitanStart.class.getClassLoader().getResource(resource).getPath(); } public static void main(String[] args) throws Exception { Configuration conf = new PropertiesConfiguration( getRelativeResourcePath( "titan-cassandra-es.properties" ) ); // start elastic search on startup Node node = new NodeBuilder().node(); TitanGraph graph = TitanFactory.open(conf); /* Comment if you do not want to reload the graph every time */ graph.close(); TitanCleanup.clear(graph); graph = TitanFactory.open(conf); GraphOfTheGodsFactory.load(graph); /* graph loaded */ // create own indexes TitanManagement mgmt = graph.openManagement(); PropertyKey name = mgmt.getPropertyKey("name"); PropertyKey age = mgmt.getPropertyKey("age"); mgmt.buildIndex( "byNameComposite", Vertex.class ).addKey(name).buildCompositeIndex(); // index consisting of multiple properties mgmt.buildIndex( "byNameAndAgeComposite", Vertex.class ).addKey(name).addKey(age).buildCompositeIndex(); mgmt.commit(); // wait for the index to become available ManagementSystem.awaitGraphIndexStatus(graph, "byNameComposite").call(); ManagementSystem.awaitGraphIndexStatus(graph, "byNameAndAgeComposite").call(); // create new vertex Vertex me = graph.addVertex("theOneAndOnly"); me.property( "name", "me" ); me.property( "age", 1 ); graph.tx().commit(); System.out.println("Created the one and only!"); // re index the existing data (not required, just for demo purposes) mgmt = graph.openManagement(); mgmt.updateIndex( mgmt.getGraphIndex("byNameComposite"), SchemaAction.REINDEX ).get(); mgmt.updateIndex( mgmt.getGraphIndex("byNameAndAgeComposite"), SchemaAction.REINDEX ).get(); mgmt.commit(); GraphTraversalSource g = graph.traversal(); GremlinPipeline<GraphTraversal<?, ?>, ?> pipe = new GremlinPipeline(); // read our new vertex pipe.start( g.V().has( "name", "me" ) ); Vertex v = (Vertex)pipe.next(); System.out.println(); System.out.println( "Label: " + v.label() ); System.out.println( "Name: " + v.property("name").value() ); System.out.println( "Age: " + v.property("age").value() ); System.out.println(); // read different vertex pipe.start( g.V().has( "name", "hercules" ) ); Vertex herclues = (Vertex)pipe.next(); System.out.println( "Label: " + herclues.label() ); System.out.println( "Name: " + herclues.property("name").value() ); System.out.println( "Age: " + herclues.property("age").value() ); // print some edges Iterator<Edge> it = herclues.edges( Direction.OUT ); while( it.hasNext() ) { Edge e = it.next(); System.out.println( "Out: " + e.label() + " --> " + e.inVertex().property("name").value() ); } System.out.println(); // close graph graph.close(); // close elastic search on shutdown node.close(); System.exit(0); } }
Some explanation:
- Load the titan configuration and start ElasticSearch
- Open a titan graph and load the Graph of the Gods
- Create an index on the name property and create a multi index on the name and age property
- Wait until the indexing is finished
- Create and insert a new vertex
- Reindex the graph (just for demo purpose, the index is updated automatically). You should delete this statement.
- Use a graph traversal source and Gremlin pipeline to query data from the graph
- Query the “theOneAndOnly” vertex we inserted in step 5
- Query “hercules” from the Graph of the Gods and retrieve his name, age and outgoing edges
- Finally close the graph and the ElasticSearch node
The console output should look something like this (omitted previous logs)
14:31:04,577 INFO metadata:566 - [Sprite] [titan] create_mapping [vertices] 14:31:04,622 INFO metadata:566 - [Sprite] [titan] create_mapping [edges] 14:31:04,643 INFO metadata:558 - [Sprite] [titan] update_mapping [edges] 14:31:10,425 INFO GraphIndexStatusWatcher:67 - Some key(s) on index byNameComposite do not currently have status REGISTERED: name=INSTALLED 14:31:10,926 INFO GraphIndexStatusWatcher:67 - Some key(s) on index byNameComposite do not currently have status REGISTERED: name=INSTALLED 14:31:11,428 INFO GraphIndexStatusWatcher:67 - Some key(s) on index byNameComposite do not currently have status REGISTERED: name=INSTALLED 14:31:11,929 INFO GraphIndexStatusWatcher:67 - Some key(s) on index byNameComposite do not currently have status REGISTERED: name=INSTALLED 14:31:12,431 INFO GraphIndexStatusWatcher:67 - Some key(s) on index byNameComposite do not currently have status REGISTERED: name=INSTALLED 14:31:12,934 INFO GraphIndexStatusWatcher:67 - Some key(s) on index byNameComposite do not currently have status REGISTERED: name=INSTALLED 14:31:13,435 INFO GraphIndexStatusWatcher:67 - Some key(s) on index byNameComposite do not currently have status REGISTERED: name=INSTALLED 14:31:13,570 INFO ManagementLogger:138 - Received all acknowledgements for eviction [1] 14:31:13,952 INFO GraphIndexStatusWatcher:67 - Some key(s) on index byNameComposite do not currently have status REGISTERED: name=INSTALLED 14:31:14,455 INFO GraphIndexStatusWatcher:67 - Some key(s) on index byNameComposite do not currently have status REGISTERED: name=INSTALLED 14:31:14,957 INFO GraphIndexStatusWatcher:67 - Some key(s) on index byNameComposite do not currently have status REGISTERED: name=INSTALLED 14:31:15,458 INFO GraphIndexStatusWatcher:67 - Some key(s) on index byNameComposite do not currently have status REGISTERED: name=INSTALLED 14:31:15,959 INFO GraphIndexStatusWatcher:67 - Some key(s) on index byNameComposite do not currently have status REGISTERED: name=INSTALLED 14:31:16,462 INFO GraphIndexStatusWatcher:67 - Some key(s) on index byNameComposite do not currently have status REGISTERED: name=INSTALLED 14:31:16,962 INFO GraphIndexStatusWatcher:67 - Some key(s) on index byNameComposite do not currently have status REGISTERED: name=INSTALLED 14:31:17,464 INFO GraphIndexStatusWatcher:67 - Some key(s) on index byNameComposite do not currently have status REGISTERED: name=INSTALLED 14:31:17,966 INFO GraphIndexStatusWatcher:67 - Some key(s) on index byNameComposite do not currently have status REGISTERED: name=INSTALLED 14:31:18,467 INFO GraphIndexStatusWatcher:67 - Some key(s) on index byNameComposite do not currently have status REGISTERED: name=INSTALLED 14:31:18,579 INFO ManagementSystem$UpdateStatusTrigger:834 - Set status REGISTERED on schema element byNameComposite with property keys [] 14:31:18,696 INFO ManagementSystem$UpdateStatusTrigger:834 - Set status REGISTERED on schema element byNameAndAgeComposite with property keys [] 14:31:18,814 INFO ManagementLogger:138 - Received all acknowledgements for eviction [2] 14:31:18,982 INFO GraphIndexStatusWatcher:69 - All 1 key(s) on index byNameComposite have status REGISTERED 14:31:19,002 INFO GraphIndexStatusWatcher:69 - All 2 key(s) on index byNameAndAgeComposite have status REGISTERED Created the one and only! 14:31:19,315 INFO IndexRepairJob:101 - Found index byNameComposite 14:31:19,328 INFO IndexRepairJob:101 - Found index byNameComposite 14:31:19,662 INFO ManagementSystem:994 - Index update job successful for [byNameComposite] 14:31:19,665 INFO IndexRepairJob:101 - Found index byNameAndAgeComposite 14:31:19,671 INFO IndexRepairJob:101 - Found index byNameAndAgeComposite 14:31:19,935 INFO ManagementSystem:994 - Index update job successful for [byNameAndAgeComposite] Label: theOneAndOnly Name: me Age: 1 Label: demigod Name: hercules Age: 30 Out: father --> jupiter Out: mother --> alcmene Out: battled --> hydra Out: battled --> nemean Out: battled --> cerberus 14:31:20,061 INFO CassandraThriftStoreManager:612 - Closed Thrift connection pooler. 14:31:20,071 INFO node:278 - [Sprite] stopping ... 14:31:20,413 INFO node:311 - [Sprite] stopped 14:31:20,413 INFO node:328 - [Sprite] closing ... 14:31:20,426 INFO node:412 - [Sprite] closed
The start up sequence for TitanDB and especially ElasticSearch can take up a while. This example runs locally and sets up TitanDB and ElasticSearch every time. In general TitanDB should be used as a distributed graph cluster and not in a local environment. This is just a training example to get used to work with TitanDB and Gremlin in general.
If you have errors, exceptions or other problems feel free to comment and ask.