Bug in 7.17.3-27.2 -- Unable to acquire a search lock on indices

Hi there,

Ever since upgrading our deployment with Federate 7.17.3-27.2 we are starting to see the following responses from time to time from the siren/_search endpoint.

{
  "error" : {
    "root_cause" : [
      {
        "type" : "i",
        "reason" : "Unable to acquire a search lock on indices MY_INDEX_NAME",
        "suppressed" : [
          {
            "type" : "index_not_found_exception",
            "reason" : "no such index [MY_INDEX_NAME]",
            "index_uuid" : "XXXXXXXX",
            "index" : "MY_INDEX_NAME"
          }
        ]
      }
    ],
    "type" : "i",
    "reason" : "Unable to acquire a search lock on indices MY_INDEX_NAME",
    "suppressed" : [
      {
        "type" : "index_not_found_exception",
        "reason" : "no such index [MY_INDEX_NAME]",
        "index_uuid" : "XXXXXXXXXXXX",
        "index" : "MY_INDEX_NAME"
      }
    ]
  },
  "status" : 500
}

It does not occur every single time. Retrying the request can sometimes make the issue go away. I’m not sure why it thinks the index isn’t found because it’s definitely there and I have no issues querying it from the normal Elasticsearch /_search endpoints.

My current workaround is to clone the index and that seems to solve the problem but then within hours or days it will start complaining about the same issue with a different index.

Is this a known issue with this version? Any ideas how to fix this?

Thank you!

Hi Hayden,

Can you please confirm how did you perform the Siren federate upgrade is it a Rolling upgrade? What was the last version of Siren federate?

Also please cross check the disk space generally when the disk space reaches 90%-95% used Elasticsearch indices get locked when there is a shortage of disk space on the server.

Regards
Manu Agarwal

Hi Hayden,

To investigate on this error, we would need some more details:

  • from which Federate version did you upgrade to Federate 7.17.3-27.2 ?
  • what is the type of this index ? Is it an alias ? A time series ?
  • can you describe your deployment ? How many nodes and how many shards/replica involved ?
  • did you notice shards relocations in between errors ?

Regards
Martin Anseaume

1 Like

Sure, happy to provide some more details:

  • We upgraded from 7.16.3-26.5
  • These indexes are all part of an alias managed by ILM
  • There are 6+ nodes and each index has 1 primary and 1 replica shard
  • All nodes have 1+ TB of free disk space
  • I have not noticed any shard relocations beyond the usual ILM ones, but the specific indexes in question had been rolled over already

I’ll stress that normal Elasticsearch search requests have no issues on the same indexes. It’s only when I switch to the /siren/INDEX/_search endpoint that I see these so it feels plugin-specific rather than a misconfiguration with the underlying index. I cannot think of any configuration change that we have made beyond updating Elasticsearch and the Federate plugin.

Happy to provide any other info you think would be helpful.

Thanks,
Hayden

Hi Hayden,

We are internally investigating on it and will get back to it soon.

Regards
Manu Agarwal

Thank you. Happy to provide any other details that might help your investigation.

I wanted to add that my team just ran into this exact issue. We were upgrading from ES 6.8.2 to 7.17.3-27.2 on a 21 node cluster.

The issue only occurred when running siren queries on indices that had also been written to after the upgrade.

We tried restarting the elastic service on each node and that seemed to resolve the problem. Though we are eager to hear if there is any more information about this issue.

@microsen Just tried restarting the problem cluster and you’re right, restarting seems to have resolved the problem. I’ll report back here if the issue crops back up, but restarting seems to be a valid workaround in the meantime.

I spoke too soon… It seemed to help initially but the errors eventually showed back up. We’re seeing a pattern where it happens after ILM moves and merges an index from a hot to a warm node.

@hayden - that is interesting. We have not seen a recurrence of the issue yet, but we also are not currently using ILM.

I’ve been running more tests on my end and I believe it has to do with shard location. One of our clusters uses ILM but does not relocate shards to warm nodes and is not giving us these errors.

So, I ran a test where I confirmed that siren searches were responding without errors. I then manually rolled over the index. As soon as the shards moved to their warm nodes siren started throwing the index not found errors. I manually moved the shards back to their original nodes and the siren requests started succeeding again.

I guess the workaround would be to ensure shards don’t move but that only works as a temporary solution because then a single node will start to be overloaded.

@Manu_Agarwal Could you please try replicating the above scenario? We are deployed in Elastic Cloud and are unable to roll back to prior versions of ES + plugin so we’re a bit stuck here.

1 Like

Hi Hayden,

Thank you for these useful details which allowed us to identify a malfunction related to indices rerouting, we are still investigating on it, and we’ll provide a patch release as soon as possible.

Regards,
Martin Anseaume

1 Like

Hi microsen,

Thank you for your feedback, we have internally an issue related to rolling upgrades, work is in progress.

Regards,
Martin Anseaume

1 Like

Hi Hayden,

We could replicate the scenario you described, the bug was fully identified and is now fixed.
Thanks again for taking the time to report this issue.
The fix will be available in the next patch release coming shortly.

Regards
Martin Anseaume

Thanks, Martin, that’s great news. I’ll keep an eye out for the update and will report back here if our issue is resolved.

Hi Hayden

We have released Federate 7.17.4-27.3 which includes the bug fix:

Kind regards

3 Likes

I’m trying to download the updated plugin now but the link is just timing out for the zip file on the download page. Any ideas? Thanks!

Hi Hayden,

Are you sure you are using the current link for download as it is working fine.

Regards
Manu Agarwal

Looks like the link is working now. We deployed this and ran a few tests with relocating shards and all seems to be resolved! Thank you for your open communication about this issue and quick resolution.

2 Likes