Elasticsearch 6.0: index JSON file via Java API throws MapperParsingException

When writing our first Elasticsearch tutorial we stumbled over a problem when working with the bulk insert. If you try to index data coming from a JSON file (or string so to say, no difference in behavior) that contains an array of objects, we always ran into a following MapperParsingException.

1. MapperParsingException

After investigating and testing a little, we discovered that the problem is the JSON array. Single objects via JSON files or strings are no problem. You can have a look here on different ways described by the Elasticsearch team on how to index/insert data. Have a look at the following JSON file we tried upload. Keep in mind that this is just an array of simple objects, no nested objects included.

2. JSON object array

The problem with that file the surrounding array (“[“,”]”). Removing the outer parenthesis leads to an invalid JSON file, which the parser will not recognize either. Consequently, in order to upload the JSON file, we created a small work around.

3. Workaround

We use the json-simple (version 1.1.1) library from Google to parse the array and add each particular array object to the Elasticsearch bulk insert. The cast to the JSON array object is rather unsafe, we can do it since we know the input exactly. If you want to index different JSON files, you have to adapt this accordingly.

After parsing the JSON array, we iterate through each object and add another bulk request. After finishing the iteration the bulk request is send to and processed by the Elasticsearch server. This way no MapperParsingException is thrown.

4. Conclusion

The presented workaround to avoid the MapperParsingException in the Elasticsearch Java API is simple but effective. We did not test the performance using a very large JSON file. The Elasticsearch parser is optimized, or works with a byte array of JSON strings internally. That means parsing the JSON file ourselves, then passing the String to the Elasticsearch parser results in redundant operation and therefore loss of performance.

We recommend to use the REST API for large bulk operations as explained here. You can clone a running code example from Git. Make sure to (un-)comment the method bulkInsert( String indexName, String indexType ) and use the overloaded method bulkInsert( String indexName, String indexType, String dataPath ) to read the JSON file.

If you have problems or errors, feel free to comment and ask.


Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.