Left join issue

Hi
I am working on a challenging task where I need to construct a query using Siren. I’m currently writing an SQL query to better outline and address this challenge

select p.* from person p
left join contact c on p.id = c.id
where c.labels != ‘A’ or c.label is null

Something like this should do

GET siren/person/_search
{
  "query" : {
    "join" : {
      "indices" : ["contact"],
      "on" : ["id", "id"], 
      "request" : { 
        "query" : {
          "bool": {
             "should": [
                {"bool":{"must_not": [{"term": {"labels": "A"}}]}},
                {"bool":{"must_not": [{"exists": {"field": "labels"}}]}}
             ]
          }
        }
      }
    }
  }
}

assuming you have person and contact ES indices. More on this topic here: Query domain-specific language (DSL) :: SIREN DOCS

You can probably optimize the query by only checking that labels is not A.

2 Likes

I appreciate your assistance. However, the query you provided performs an inner join, whereas I require a left join. Let me clarify my requirements with an example:

Scenario:

I have 5 person in the person index.
Only 3 contacts exist in the contact index.
One of these contacts has the label ‘A’.
Objective:

I want to display all person who do not have the label ‘A’.
The expected output should include 4 person (since one contact has label ‘A’ and the remaining person either have no contact or have contact without label ‘A’).
Issue:

The query you provided returns only 2 person, which does not align with the expected result.
Could you please help me adjust the Siren query to perform a left join and achieve the desired output?

POST /_bulk
{ “index”: { “_index”: “person”, “_id”: “1” } }
{ “id”: 1, “name”: “Person 1” }
{ “index”: { “_index”: “person”, “_id”: “2” } }
{ “id”: 2, “name”: “Person 2” }
{ “index”: { “_index”: “person”, “_id”: “3” } }
{ “id”: 3, “name”: “Person 3” }
{ “index”: { “_index”: “person”, “_id”: “4” } }
{ “id”: 4, “name”: “Person 4” }
{ “index”: { “_index”: “person”, “_id”: “5” } }
{ “id”: 5, “name”: “Person 5” }
{ “index”: { “_index”: “contact”, “_id”: “1” } }
{ “id”: 1, “label”: “B” }
{ “index”: { “_index”: “contact”, “_id”: “2” } }
{ “id”: 2, “label”: “A” }
{ “index”: { “_index”: “contact”, “_id”: “3” } }
{ “id”: 3, “label”: “C” }

and this is your query

{
“query”: {
“join”: {
“indices”: [
“contact”
],
“on”: [
“id”,
“id”
],
“request”: {
“query”: {
“bool”: {
“should”: [
{
“bool”: {
“must_not”: [
{
“term”: {
“label.keyword”: “A”
}
}
]
}
},
{
“bool”: {
“must_not”: [
{
“exists”: {
“field”: “label.keyword”
}
}
]
}
}
]
}
}
}
}
}
}

Thank you!

Thanks for your explanation and example.

In my understanding, you want to retrieve all the documents from person with the exception of those that join with documents from contact whose label is A.

You can nest a join query inside a must_not clause to exclude its results from the final ones.

Can you check if the last query of the following code block does what you need?

PUT /person
{
  "mappings": {
    "properties": {
      "id": {
        "type": "integer"
      },
      "name": {
        "type": "keyword"
      }
    }
  }
}

PUT /contact
{
  "mappings": {
    "properties": {
      "id": {
        "type": "integer"
      },
      "label": {
        "type": "keyword"
      }
    }
  }
}

POST /_bulk
{"index":{"_index":"person","_id":"1"}}
{"id":1,"name":"Person 1"}
{"index":{"_index":"person","_id":"2"}}
{"id":2,"name":"Person 2"}
{"index":{"_index":"person","_id":"3"}}
{"id":3,"name":"Person 3"}
{"index":{"_index":"person","_id":"4"}}
{"id":4,"name":"Person 4"}
{"index":{"_index":"person","_id":"5"}}
{"id":5,"name":"Person 5"}
{"index":{"_index":"contact","_id":"1"}}
{"id":1,"label":"B"}
{"index":{"_index":"contact","_id":"2"}}
{"id":2,"label":"A"}
{"index":{"_index":"contact","_id":"3"}}
{"id":3,"label":"C"}

// previous query
GET siren/person/_search
{
  "query" : {
    "join" : {
      "indices" : ["contact"],
      "on" : ["id", "id"], 
      "request" : { 
        "query" : {
          "bool": {
             "should": [
                {"bool":{"must_not": [{"term": {"label": "A"}}]}},
                {"bool":{"must_not": [{"exists": {"field": "label"}}]}}
             ]
          }
        }
      }
    }
  }
}

// edited query
GET siren/person/_search
{
  "query": {
    "bool": {
      "must_not": [
        {
          "join": {
            "indices": [
              "contact"
            ],
            "on": [
              "id",
              "id"
            ],
            "request": {
              "query": {
                "term": {
                  "label": "A"
                }
              }
            }
          }
        }
      ]
    }
  }
}
1 Like

Thank you for your guidance and assistance; this is exactly what I wanted

1 Like