When writing our first Elasticsearch tutorial we stumbled over a problem when working with the bulk insert. If you try to index data coming from a JSON file (or string so to say, no difference in behavior) that contains an array of objects, we always ran into a following MapperParsingException.
1. MapperParsingException
org.elasticsearch.index.mapper.MapperParsingException: failed to parse [...] Caused by: org.elasticsearch.common.compress.NotXContentException: Compressor detection can only be called on some xcontent bytes or compressed xcontent bytes
After investigating and testing a little, we discovered that the problem is the JSON array. Single objects via JSON files or strings are no problem. You can have a look here on different ways described by the Elasticsearch team on how to index/insert data. Have a look at the following JSON file we tried upload. Keep in mind that this is just an array of simple objects, no nested objects included.
2. JSON object array
[ { "name" : "Mark Twain", "age" : 75 }, { "name" : "Tom Saywer", "age" : 12 }, { "name" : "John Doe", "age" : 20 }, { "name" : "Peter Pan", "age" : 15 }, { "name" : "Johnnie Walker", "age" : 37 } ]
The problem with that file the surrounding array (“[“,”]”). Removing the outer parenthesis leads to an invalid JSON file, which the parser will not recognize either. Consequently, in order to upload the JSON file, we created a small work around.
3. Workaround
We use the json-simple (version 1.1.1) library from Google to parse the array and add each particular array object to the Elasticsearch bulk insert. The cast to the JSON array object is rather unsafe, we can do it since we know the input exactly. If you want to index different JSON files, you have to adapt this accordingly.
public boolean bulkInsert( String indexName, String indexType, String dataPath ) throws IOException, ParseException { BulkRequestBuilder bulkRequest = client.prepareBulk(); JSONParser parser = new JSONParser(); // we know we get an array from the example data JSONArray jsonArray = (JSONArray) parser.parse( new FileReader( dataPath ) ); @SuppressWarnings("unchecked") Iterator<JSONObject> it = jsonArray.iterator(); while( it.hasNext() ) { JSONObject json = it.next(); logger.info( "Insert document: " + json.toJSONString() ); bulkRequest.setRefreshPolicy( RefreshPolicy.IMMEDIATE ).add( client.prepareIndex( indexName, indexType ) .setSource( json.toJSONString(), XContentType.JSON ) ); } BulkResponse bulkResponse = bulkRequest.get(); if ( bulkResponse.hasFailures() ) { logger.info( "Bulk insert failed: " + bulkResponse.buildFailureMessage() ); return false; } return true; }
After parsing the JSON array, we iterate through each object and add another bulk request. After finishing the iteration the bulk request is send to and processed by the Elasticsearch server. This way no MapperParsingException is thrown.
4. Conclusion
The presented workaround to avoid the MapperParsingException in the Elasticsearch Java API is simple but effective. We did not test the performance using a very large JSON file. The Elasticsearch parser is optimized, or works with a byte array of JSON strings internally. That means parsing the JSON file ourselves, then passing the String to the Elasticsearch parser results in redundant operation and therefore loss of performance.
We recommend to use the REST API for large bulk operations as explained here. You can clone a running code example from Git. Make sure to (un-)comment the method bulkInsert( String indexName, String indexType ) and use the overloaded method bulkInsert( String indexName, String indexType, String dataPath ) to read the JSON file.
If you have problems or errors, feel free to comment and ask.





