Jupyter + Siren

(edit: if a moderator wants to move this post to the Siren Federate forum it might make more sense than here in the Investigate/UI section)

I have been exploring programmatic interaction from Python/Jupyter to Siren endpoints, and have seen some unexpected behavior with querying and projecting fields in the /siren/<index>/_search endpoints. They may be bugs, or I may be misunderstanding the documentation. I put together a small repo that will start up Siren’s demo container and a single-user Jupyter environment via docker-compose at GitHub - kafonek/siren: Demo and bug reporting for Jupyter Notebook interaction to Siren-Elasticsearch in case you want to reproduce the queries.

I’ve shown the raw RESTful queries for the bugs (?) in the /notebooks/bugs directory. The first issue is that project fields are not coming back in my result set. The second issue is DSL parse errors while trying to include query terms for one index while also joining to another index. (Filtering by Both Indices explains how to do the boolean must query)

Any advice on those bug Notebooks is appreciated, thanks!

Hello Matt,

You must use a script_fields in order to see the project fields in the response. See example in Query domain-specific language (DSL) :: SIREN DOCS where you see the following:

{
    ...,
    "script_fields" : {
      "employees_age" : {
        "script" : "doc.employee_age"                                    
      }
    }
}

In general, you just need to refer in the script the field you want to see, e.g., for your demo, it would be:

{
    ...,
    "script_fields" : {
      "article" : {
        "script" : "doc.article_source"                                    
      }
    }
}

Cheers,

1 Like

Hi Matt,

As Stephane mentioned here is the query for your use case to expect to see {'fields': {'article_source': [<value>]}} in the results:

GET /siren/company/_search
{
   "query": {
    "join": {
      "indices": ["article"],
      "on": ["id", "companies"],
      "request": {
        "project" : [
            { "field" : { "name" : "article.source", "alias" : "article_source" } }  
        ],
         "query": {
          "bool": {
            "filter": [{
              "term": {
                  "article.source": "New York Times"
              }
            }
            ]
          }
        }
          }
        }
},
"script_fields" : {
      "article_source" : {
        "script" : "doc.article_source" 
      }
    }
    }

Regards
Manu