Ontology traversal with Jena and SPARQL

In this tutorial we demonstrate how to traverse through an Ontology using Apache Jena. We show two approaches using the Jena API and SPARQL model queries.

1. Requirements:

2. Some information

An ontology describes the types, properties and relationships between entities of a particular domain. The pizza.owl ontology describes different kinds of pizza like vegetarian pizza or meaty pizza. Additionally toppings or spiciness are categorized and linked via axioms to describe the pizza domain. E.g. a vegetarian pizza can not have a meaty topping.

When traversing through an ontology, you have to remember that it is represented as graph. The most effective (in terms of less coding) way to handle graph and tree like structures is recursion. We suggest to start reading about recursion in order to understand the following code snippets, if you are not familiar with it.

3. Traversal using Jena API methods

The following java code reads an ontology and traverses through every class in the ontology.

import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.util.ArrayList;
import java.util.Iterator;
import java.util.List;

import org.apache.jena.ontology.OntClass;
import org.apache.jena.ontology.OntModel;
import org.apache.jena.rdf.model.ModelFactory;


public class OntologyTraverserAPI 
{
	public static void readOntology( String file, OntModel model )
	{
		InputStream in = null;
		try
		{
			in = new FileInputStream( file );
			model.read(in, "RDF/XML");
			in.close();
		} catch (IOException e) 
		{
			e.printStackTrace();
		} 
	}
	
	/**
	 * Traverse the Ontology to find all given concepts
	 */
    public static void traverseStart( OntModel model, OntClass ontClass ) 
    {
    	// if ontClass is specified we only traverse down that branch
    	if( ontClass != null )
    	{
    		traverse(ontClass, new ArrayList<OntClass>(), 0);
    		return;
    	}
    	
        // create an iterator over the root classes
        Iterator<OntClass> i = 	model.listHierarchyRootClasses();
        
        // traverse through all roots
        while (i.hasNext()) 
        {
        	OntClass tmp = i.next();
            traverse( tmp, new ArrayList<OntClass>(), 0 );
        }
    }
    
    /**
     * Start from a class, then recurse down to the sub-classes.
     * Use occurs check to prevent getting stuck in a loop
     * @param oc OntClass to traverse from
     * @param occurs stores visited nodes
     * @param depth indicates the graph "depth" 
     * @return list of concepts / entities which were visited when recursing through the hierarchy (avoid loops)
     */
    private static void traverse( OntClass oc, List<OntClass> occurs, int depth )
    {
    	if( oc == null ) return;

    	// if end reached abort (Thing == root, Nothing == deadlock)
    	if( oc.getLocalName() == null || oc.getLocalName().equals( "Nothing" ) ) return;
    	
		// print depth times "\t" to retrieve a explorer tree like output
		for( int i = 0; i < depth; i++ ) { System.out.print("\t"); }
		
		// print out the OntClass
		System.out.println( oc.toString() );
		
        // check if we already visited this OntClass (avoid loops in graphs)
        if ( oc.canAs( OntClass.class ) && !occurs.contains( oc ) ) 
        {
        	// for every subClass, traverse down
            for ( Iterator<OntClass> i = oc.listSubClasses( true );  i.hasNext(); ) 
            {
                OntClass subClass = i.next();
                	                
                // push this expression on the occurs list before we recurse to avoid loops
                occurs.add( oc );
                // traverse down and increase depth (used for logging tabs)
                traverse( subClass, occurs, depth + 1 );
                // after traversing the path, remove from occurs list
                occurs.remove( oc );
            }
        }
    	
    }
	
	public static void main(String[] args) 
	{
		// create OntModel
		OntModel model = ModelFactory.createOntologyModel();
		// read camera ontology
		readOntology( "./ontology/camera.owl", model );
		// start traverse
		traverseStart( model, null );
	}

}

The “readOntology” method reads an ontology into a Jena OntModel. This model can be queried using the Jena API methods like “listSubClasses” etc.

“TraverseStart” has an optional parameter for an OntClass, to specify a certain starting class for the traversal. If this parameter is Null, all known roots are used as starting point.

The recursion happens the private “traverse” method, which is called over and over again for each class. Remember to have an abort criteria, otherwise you can easily run into loops and therefore cause your stack to run out of memory.

The output for the camera.owl looks like this:

http://www.xfront.com/owl/ontologies/camera/#Money
http://www.xfront.com/owl/ontologies/camera/#SLR
http://www.xfront.com/owl/ontologies/camera/#Window
http://www.xfront.com/owl/ontologies/camera/#Range
http://www.xfront.com/owl/ontologies/camera/#BodyWithNonAdjustableShutterSpeed
http://www.xfront.com/owl/ontologies/camera/#PurchaseableItem
	http://www.xfront.com/owl/ontologies/camera/#Camera
		http://www.xfront.com/owl/ontologies/camera/#Digital
		http://www.xfront.com/owl/ontologies/camera/#Large-Format
	http://www.xfront.com/owl/ontologies/camera/#Lens
	http://www.xfront.com/owl/ontologies/camera/#Body

4. Traversal using SPARQL queries

The following Java code replicates the functionality from above, but instead of using Jena methods, we query the OntModel ourselfes using SPARQL. The following code reads an ontology and traverses down using SPARQL queries:

import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.util.ArrayList;
import java.util.List;

import org.apache.jena.ontology.OntModel;
import org.apache.jena.query.Query;
import org.apache.jena.query.QueryExecution;
import org.apache.jena.query.QueryExecutionFactory;
import org.apache.jena.query.QueryFactory;
import org.apache.jena.query.QuerySolution;
import org.apache.jena.query.ResultSet;
import org.apache.jena.rdf.model.ModelFactory;
import org.apache.jena.rdf.model.RDFNode;


public class OntologyTraverserSPARQL 
{
	public static void readOntology( String file, OntModel model )
	{
		InputStream in = null;
		try
		{
			in = new FileInputStream( file );
			model.read( in, "RDF/XML" );
			in.close();
		} catch ( IOException e ) 
		{
			e.printStackTrace();
		} 
	}
	
	private static List<String> getRoots( OntModel model )
	{
		List<String> roots = new ArrayList<String>();
		
		// find all owl:Class entities and filter these which do not have a parent
		String getRootsQuery = 
				  "SELECT DISTINCT ?s WHERE " 
				+ "{"
				+ "  ?s <http://www.w3.org/2000/01/rdf-schema#subClassOf> <http://www.w3.org/2002/07/owl#Thing> . " 
				+ "  FILTER ( ?s != <http://www.w3.org/2002/07/owl#Thing> && ?s != <http://www.w3.org/2002/07/owl#Nothing> ) . " 
				+ "  OPTIONAL { ?s <http://www.w3.org/2000/01/rdf-schema#subClassOf> ?super . " 
				+ "  FILTER ( ?super != <http://www.w3.org/2002/07/owl#Thing> && ?super != ?s ) } . " 
				+ "}";
		
		Query query = QueryFactory.create( getRootsQuery );
		
		try ( QueryExecution qexec = QueryExecutionFactory.create( query, model ) ) 
		{
			ResultSet results = qexec.execSelect();
			while( results.hasNext() )
			{
				QuerySolution soln = results.nextSolution();
				RDFNode sub = soln.get("s"); 
				
				if( !sub.isURIResource() ) continue;
				
				roots.add( sub.toString() );
			}
		}
		
		return roots;
	}
	
	public static void traverseStart( OntModel model, String entity )
	{
		// if starting class available
		if( entity != null ) 
		{
			traverse( model, entity,  new ArrayList<String>(), 0  );
		}
		// get roots and traverse each root
		else
		{
			List<String> roots = getRoots( model );
		
			for( int i = 0; i < roots.size(); i++ )
			{
				traverse( model, roots.get( i ), new ArrayList<String>(), 0 );
			}
		}
	}
	
	public static void traverse( OntModel model, String entity, List<String> occurs, int depth )
	{
		if( entity == null ) return;
		
		String queryString 	= "SELECT ?s WHERE { "
						   	+ "?s <http://www.w3.org/2000/01/rdf-schema#subClassOf> <" + entity + "> . }" ;
		
		Query query = QueryFactory.create( queryString  );
		
		if ( !occurs.contains( entity ) ) 
		{
			// print depth times "\t" to retrieve an explorer tree like output
			for( int i = 0; i < depth; i++ ) { System.out.print("\t"); }
			// print out the URI
			System.out.println( entity );
			
			try ( QueryExecution qexec = QueryExecutionFactory.create( query, model ) ) 
			{
				ResultSet results = qexec.execSelect();
				while( results.hasNext() )
				{
					QuerySolution soln = results.nextSolution();
					RDFNode sub = soln.get("s"); 
					
					if( !sub.isURIResource() ) continue;
					
					String str = sub.toString();
					
	                // push this expression on the occurs list before we recurse to avoid loops
	                occurs.add( entity );
	                // traverse down and increase depth (used for logging tabs)
					traverse( model, str, occurs, depth + 1 );
	                // after traversing the path, remove from occurs list
	                occurs.remove( entity );
				}
			}
		}
		
	}
	
	public static void main(String[] args) 
	{
		// create OntModel
		OntModel model = ModelFactory.createOntologyModel();
		// read camera ontology
		readOntology( "./ontology/camera.owl", model );
		// start traverse
		traverseStart( model, null );
	}

}

The “readOntology” method stays the same. Since we do not want to use the Jena API to query the model, we have to extract the roots ourselfes. That is what the “getRoots” method does.

The “traverseStart” and “traverse” method are equal to the ones on top. Executing the code returns the following using the camera.owl:

http://www.xfront.com/owl/ontologies/camera/#PurchaseableItem
	http://www.xfront.com/owl/ontologies/camera/#Camera
		http://www.xfront.com/owl/ontologies/camera/#Digital
		http://www.xfront.com/owl/ontologies/camera/#Large-Format
	http://www.xfront.com/owl/ontologies/camera/#Lens
	http://www.xfront.com/owl/ontologies/camera/#Body
	http://www.xfront.com/owl/ontologies/camera/#Large-Format
	http://www.xfront.com/owl/ontologies/camera/#Digital
http://www.xfront.com/owl/ontologies/camera/#Window
http://www.xfront.com/owl/ontologies/camera/#Range
http://www.xfront.com/owl/ontologies/camera/#Money
http://www.xfront.com/owl/ontologies/camera/#Camera
	http://www.xfront.com/owl/ontologies/camera/#Digital
	http://www.xfront.com/owl/ontologies/camera/#Large-Format
http://www.xfront.com/owl/ontologies/camera/#Large-Format
http://www.xfront.com/owl/ontologies/camera/#Digital
http://www.xfront.com/owl/ontologies/camera/#Body
http://www.xfront.com/owl/ontologies/camera/#Lens

The output looks slightly different. Thats because the SPARQL query strictly works on the triples provided in the OWL file. There are constructs like:

     <owl:Class rdf:ID="Camera">
          <rdfs:subClassOf rdf:resource="#PurchaseableItem"/>
     </owl:Class>

     <owl:Class rdf:ID="SLR">
          <owl:intersectionOf rdf:parseType="Collection">
               <owl:Class rdf:about="#Camera"/>
               <owl:Restriction>
                     <owl:onProperty rdf:resource="#viewFinder"/>
                     <owl:hasValue rdf:resource="#ThroughTheLens"/>
               </owl:Restriction>
          </owl:intersectionOf>
     </owl:Class>

The first declaration of Camera represents a subClass of  PurchaseableItem. Therefore it would not count as root. The owl:intersectionOf and the included “redefinition” of the Camera results in the Camera being a root class in terms of the “getRoots” method.

These axioms / restrictions have to be processed and filtered, which we did not do in the presented code.

5. Conclusion

You can extract the topology of an ontology using both the Jena API and SPARQL queries. While the Jena API offers many methods to retrieve data conveniently and you do not require any functionality not covered from Jena, i would stick to the Jena API.

If you have to query more complex data, you can not avoid to write your own SPARQL queries. This is cumbersome but more powerful. We saw that in terms of the topology, the Jena API did better (in terms of results expected) than (our) hand written SPARQL queries.

However it is possible to reach the same results if you improve the SPARQL queries.

If you have errors, exceptions, problems or improvements feel free to comment and ask.

Facebooktwitterredditpinterestlinkedinmail

Related posts

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.