Large document search

Hi,

I have indexed some pretty large documents around 3Gb. It was indexed without any errors but when I use the dashboard to search it, I get the following error:

Error: Request to Elasticsearch failed: {"error":{"root_cause":[{"type":"illegal_argument_exception","reason":"The length [1145037] of field [content] in doc[148530]/index[test-index] exceeds the [index.highlight.max_analyzed_offset] limit [1000000]. To avoid this error, set the query parameter [max_analyzed_offset] to a value less than index setting [1000000] and this will tolerate long field values by truncating them."}],"type":"search_phase_execution_exception","reason":"all shards failed","phase":"query","grouped":true,"failed_shards":[{"shard":0,"index":"test-index","node":"6YlZChhzS5iKnv_8bURH8A","reason":{"type":"illegal_argument_exception","reason":"The length [1145037] of field [content] in doc[148530]/index[test-index] exceeds the [index.highlight.max_analyzed_offset] limit [1000000]. To avoid this error, set the query parameter [max_analyzed_offset] to a value less than index setting [1000000] and this will tolerate long field values by truncating them."}}],"caused_by":{"type":"illegal_argument_exception","reason":"The length [1145037] of field [content] in doc[148530]/index[test-index] exceeds the [index.highlight.max_analyzed_offset] limit [1000000]. To avoid this error, set the query parameter [max_analyzed_offset] to a value less than index setting [1000000] and this will tolerate long field values by truncating them.","caused_by":{"type":"illegal_argument_exception","reason":"The length [1145037] of field [content] in doc[148530]/index[test-index] exceeds the [index.highlight.max_analyzed_offset] limit [1000000]. To avoid this error, set the query parameter [max_analyzed_offset] to a value less than index setting [1000000] and this will tolerate long field values by truncating them."}}},"status":400}

Any idea what’s going and how I can fix it?

Hi James,

Add a max_analyzed_offset query parameter to allow users to limit the highlighting of text fields to a value less than or equal to the index.highlight.max_analyzed_offset which is 1000000, thus avoiding an exception when the length of the text field exceeds the limit. The highlighting still takes place, but stops at the length defined by the new parameter.

Note:- Plain highlighting for large texts may require substantial amount of time and memory. To protect against this, the maximum number of text characters that will be analyzed has been limited to 1000000. This default limit can be changed for a particular index with the index setting index.highlight.max_analyzed_offset

Although you can disable highlighting completely (if that’s what you prefer) to get rid of this error message by disabling the advanced setting, doc_table:highlight

Regards
Manu Agarwal

Manu,

Understood, thank you!