Elasticsearch 6.0: create index, bulk insert and delete data via Java

In this tutorial we set up a local Elasticsearch 6.0 server and create indices, insert, delete and query data via the Java API on Windows. Elasticsearch is a distributed full-text NoSQL (data is stored in JSON format) search engine based on Apache Lucene and written in Java. Apache Solr and Elasticsearch are the most prevalent search servers. The following example code is provided as maven project on Git.

1. Elasticsearch prerequisites

  • Download and install JVM 1.8; Remember to set JAVA_HOME environment variable (for Windows)
  • Download the zipped Elasticsearch server 6.0
  • Maven to run the example Java code (How to setup Maven?)

2. Installation

  1. If not already installed, install the JVM 1.8 from the link above. Set your JAVA_HOME environment variable to the JDK or JRE folder of your Java installation. Elasticsearch will automatically choose the version defined in the JAVA_HOME variable on startup.
  2. Unzip the Elasticsearch zip file. We use the standard development configuration settings (no production) with one exception. We changed the cluster name (cluster.name) to “tutorial-academy-cluster” in the config/elasticsearch.yml file.
  3. Run the bin/elasticsearch.bat to start the server
  4. Go to http://localhost:9200 to check if the server started correctly. You should see an output similar to this:
  5. The server is running, lets move on to some code: Clone the Git repository.
  6. If not already done, install Maven and run the clean install command to build the project or work from
  7. You can work or execute the code via an IDE like Eclipse or IntelliJ as well

3. Code step-by-step

We use a helper class called ElasticSearchConnector to abstract some functionality. Let us start with the initialization of a Elasticsearch client to communicate with the server. In our code, that is happening in the Constructor along with changing some settings. You can see that we set the cluster name as an example, but basically ignore the name a line later to be compatible with any cluster configuration. Finally we connect to the localhost on port 9300, which can be seen in the main method later on.

Now that we are connected to the server, the Java API offers functionality to e.g. check the status of the cluster. We wait for the GREEN status which indicates that the cluster is healthy, meaning synchronized and ready to work with.

The cluster is ready now and we can start with creating an index. Before that, we check that the same index was not created previosly.

If the index does not exist already, we create the index.

You can go to http://localhost:9200/_cat/indices? to check if the index (tutorial-academy) was created and what its status is. The other numbers indicate e.g. how many documents are indexed, how many are deleted etc.

After successfully creating the index, we start to load some data. The loaded data corresponds to this JSON file.

We basically want to index a JSON array consisting of objects with the properties name and age. We use a bulk insert to insert all the data at once. In our tests it happened that the cluster health status was not ready when we tried to run a search/delete query directly after the insert. Consequently, we added the setRefreshPolicy( RefreshPolicy.IMMEDIATE ) method to signalize the server to refresh the index after the specified request. The data can now be queried directly after.

To query the data, we use a SearchResponse in combination with a scroll. A scroll is basically the Elasticsearch counterpart to a cursor in a traditional SQL database. Using that sort of query is quite an overkill for our example and just for demonstration purposes. It is rather used to query large amounts of data (not like five documents in our case) and not intended for real-time user requests.

After successfully querying data, we try to delete documents using a key-value pair to get deeper into the Elasticsearch behavior.

Now let us put it all together in a small main function. Here we read some properties like the number of shards, index name etc. from a properties file and start to call the methods described above.

Imports and missing methods can be found in the Git repository. The main method performs the following steps:

  1. Read configuration parameters from the properties file
  2. Connect to the Elasticsearch server
  3. Check the status of the server
  4. Check if our defined index is already created
  5. If not, create and insert data
  6. Query data using an age filter
  7. Remove one document
  8. Query data again

4. Conclusion

That is it. You successfully deployed your Elasticsearch application. We demonstrated only some basic concepts and interaction with the server. The Java API and especially the REST API offer far more functionality. If you are interested, we recommend to check it out and dive deeper into it. Fast search access to data becomes more and more relevant and is a bonus for every developer out there.

Get familiar with the way Elasticsearch works, how to work with the requests, how to deploy in production and especially determine if you require synchronous (wait for the operation to finish) or asynchronous (call back function when done – no blocking) calls. The code above is executed directly after the server call.

If you have problems or errors, feel free to comment and ask.


Leave a Reply