Join, add fields to results and scroll

Hello everyone,

I’d like to get results from a join between 2 indices : a main index (index1) with all its fields augmented with a field from a second index (on an N to 1 relationship : I should get only 1 field from index 2 to add to my index 1). And I’d like to scroll through the results. So I did something like this :

POST /siren/my_index1/_search?scroll=1m
{
"size": 10000,
"query": {
    "join": {
        "indices": [
            "my_index2"
        ],
        "on": [
            "joinfield_I1.keyword",
            "joinfield_I2.keyword"
        ],
        "request": {
            "project": [
                {
                    "field": {
                        "name": "another_field_from_index2.keyword",  "alias": "another_field_from_index2"
                    }
                }
            ],
            "query": {
                "match_all": {}
            }
        }
    }
},
"script_fields": {
    "another_field_from_index2": {
        "script": "doc.another_field_from_index2"
    }
},
"_source": true

}
`

Then I get what I wanted which is : my 10000 first elements with all their fields and the field another_field_from_index2.

Then I call the scroll API with the scroll_id I received in the previous response. And I get an exception mentioning the scripted field :

 'doc.another_field_from_index2',
    '   ^---- HERE' ],
...
 Data entry [...] does not exist.

Am I to understand that it’s not possible to get this added field in the following scroll results? Is there a better way?

Thanks for reading, any insights welcomed.

Hi Aadrien,

Can you get all (most) the documents without a scroll query? like size: 100000 or some other value?

Regards
Manu

Hello Aadrien,

Scroll is not supported with the project clause, we will add an error/warning message to reflect that.
In the meantime, you could use the search_after parameter. However, bear in mind that the join won’t be cached in that case and it will recomputed every time.

Cheers,

3 Likes

Hello Stephane,
search_after actually works fine for my use case.
Thanks a lot!