Federate Object Runtime Field

When doing an investigation involving inter-connected data, it is often important to know why a particular document is a search hit. If the investigation involves join operations, such an explanation can be achieved by retaining information from joined documents. With complex searches containing several joins, possibly nested, the explanation takes the shape of a tree. The federate_object is a new type of runtime field in Siren Federate – available at the time of writing in version 35 from Elasticsearch 8.13.2 – that creates concise trees from joined documents.

The federate_object runtime field allows to retain the structure of joins in projected data. This new feature enables the creation of nested data structures at runtime during join computations by merging the projection results of joins. For example, it can help us to determine whether values of a specific field, projected from a child index, co-occur within the same document or are merely related to the same parent document.

Projecting Fields Separately

Consider a typical permission-access use case shown below. The user index stores users in a company. A user can have one or more access group, stored in the group index. The group index defines the different roles associated with a group, e.g., the admin group has 2 roles: read_internal and read_client_data. A role can have one or more policies which define what indices can be read.

We can ingest above documents into an Elasticsearch cluster (version 8.13.2) and issue the following search request which returns the granted policies, along with descriptions, for the user alice. The search request projects the policy’s name and description L39-52, which fields are then propagated up to the search response through the intermediate project clauses, i.e., lines 9 and 24. In those intermediate clauses, the name of the respective group and role is also projected. The projected data is returned in the response thanks to the fields clause L3.

Search Request

 1 GET siren/user/_search
 2 {
 3    "fields": [ "group_name", "role_name", "policy_name", "policy_description" ],
 4    "query": {
 5       "join": {
 6          "indices": [ "group" ],
 7          "on": [ "group.keyword", "name.keyword" ],
 8          "request": {
 9             "project": [
10                {
11                   "field": {
12                      "name": "name.keyword",
13                      "alias": "group_name"
14                   }
15                },
16                { "field": { "name": "role_name" } },
17                { "field": { "name": "policy_name" } },
18                { "field": { "name": "policy_description" } }
19             ],
20             "query": {
21                "join": {
22                   "indices": [ "role" ],
23                   "on": [ "role.keyword", "name.keyword" ],
24                   "request": {
25                      "project": [
26                         {
27                            "field": {
28                               "name": "name.keyword",
29                               "alias": "role_name"
30                            }
31                         },
32                         { "field": { "name": "policy_name" } },
33                         { "field": { "name": "policy_description" } }
34                      ],
35                      "query": {
36                         "join": {
37                            "indices": [ "policy" ],
38                            "on": [ "policy.keyword", "name.keyword" ],
39                            "request": {
40                               "project": [
41                                  {
42                                     "field": {
43                                        "name": "name.keyword",
44                                        "alias": "policy_name"
45                                     }
46                                  },
47                                  {
48                                     "field": {
49                                        "name": "description.keyword",
50                                        "alias": "policy_description"
51                                     }
52                                  }
53                               ],
54                               "query": { "match_all": {} }
55                            }
56                         }
57                      }
58                   }
59                }
60             }
61          }
62       }
63    }
64 }

The search response presents several issues that we will resolve using the federate_object runtime field:

  1. It is impossible to know to which description a particular policy relates to.
  2. It is impossible to know which role granted a particular policy.
  3. The projected data is duplicated (indicated in the response below with a comment // duplicated X times).

Search Response

{
    "_shards": {
        "failed": 0,
        "skipped": 0,
        "successful": 1,
        "total": 1
    },
    "hits": {
        "hits": [
            {
                "_id": "YqDx6o0BqmW7-koOO6m-",
                "_index": "user",
                "_score": 1.0,
                "_source": {
                    "group": "admin",
                    "name": "alice"
                },
                "fields": {
                    "group_name": [
                        "admin" 		    	      // duplicated 125 times
                    ],
                    "policy_description": [
                        "access to client data",      // duplicated 50 times
                        "access to employees data",   // duplicated 25 times
                        "access to sales data"        // duplicated 50 times
                    ],
                    "policy_name": [
                        "read_clients",               // duplicated 50 times
                        "read_employees",             // duplicated 25 times
                        "read_sales"                  // duplicated 50 times
                    ],
                    "role_name": [
                        "read_client_data",           // duplicated 100 times
                        "read_internal"               // duplicated 25 times
                    ]
                }
            }
        ],
        "max_score": 1.0,
        "total": {
            "relation": "eq",
            "value": 1
        }
    },
    "planner": {
        "is_truncated": false,
        "node": "h-tp6O8-Rtmp6YHQlzEUsg",
        "timestamp": {
            "execution": {
                "start_in_millis": 1709047495219,
                "stop_in_millis": 1709047495263,
                "took_in_millis": 44
            },
            "job_waiting": {
                "start_in_millis": 1709047495201,
                "stop_in_millis": 1709047495201,
                "took_in_millis": 0
            },
            "planning": {
                "start_in_millis": 1709047495202,
                "stop_in_millis": 1709047495216,
                "took_in_millis": 14
            },
            "start_in_millis": 1709047495201,
            "stop_in_millis": 1709047495263,
            "took_in_millis": 62
        },
        "took_in_millis": 62,
        "type": "static"
    },
    "timed_out": false,
    "took": 4
}

Projecting Objects: Federate Object Runtime Field in Action

The order of values in a projected field, e.g., policy_name, may not coincide with the order of another projected field, e.g., role_name. Values are duplicated due to the Cartesian Product needed when projecting fields. This is caused by the lack of an array internal representation of the data. By using the federate_object runtime field, it is possible to get a more compact response that also retains the structure formed by the joins. The federate_object runtime field is expressed as follows:

  • The runtime field is given the name my_object L2.
  • The type of the runtime field is federate_object L3.
  • The fields that compose the object are listed via the fields option L4-8, which may include another federate_object runtime field which would therefore reflect the joins structure.
 1 {
 2    "my_object": {
 3       "type": "federate_object",
 4       "fields": [
 5          "field1",
 6          "field2",
 7          "another_object"
 8       ]
 9    }
10 }

Returning a structured response

The search request is updated so that each join now projects federate_object runtime fields, i.e., lines 16, 30, and 44. The object runtime fields are defined lines 10-13, 24-27, and 38-41. The group_obj object runtime field is returned as a structured JSON object L19-43 of the search response. It is now clear which role granted a particular policy, and which policy a particular description relates to. The duplication problem is also resolved thanks to the internal representation of the federate_object runtime field which effectively bypasses the Cartesian Product.

Search Request

 1 GET siren/user/_search
 2 {
 3    "fields": [ "group_obj" ],
 4    "query": {
 5       "join": {
 6          "indices": [ "group" ],
 7          "on": [ "group.keyword", "name.keyword" ],
 8          "request": {
 9             "runtime_mappings": {
10                "group_obj": {
11                   "type": "federate_object",
12                   "fields": [ "name.keyword", "role_obj" ]
13                }
14             },
15             "project": [
16                { "field": { "name": "group_obj" } }
17             ],
18             "query": {
19                "join": {
20                   "indices": [ "role" ],
21                   "on": [ "role.keyword", "name.keyword" ],
22                   "request": {
23                      "runtime_mappings": {
24                         "role_obj": {
25                            "type": "federate_object",
26                            "fields": [ "name.keyword", "policy_obj" ]
27                         }
28                      },
29                      "project": [
30                         { "field": { "name": "role_obj" } }
31                      ],
32                      "query": {
33                         "join": {
34                            "indices": [ "policy" ],
35                            "on": [ "policy.keyword", "name.keyword" ],
36                            "request": {
37                               "runtime_mappings": {
38                                  "policy_obj": {
39                                     "type": "federate_object",
40                                     "fields": [ "name.keyword", "description.keyword" ]
41                                  }
42                               },
43                               "project": [
44                                  { "field": { "name": "policy_obj" } }
45                               ],
46                               "query": { "match_all": {} }
47                            }
48                         }
49                      }
50                   }
51                }
52             }
53          }
54       }
55    }
56 }

Search Response

 1 {
 2     "_shards": {
 3         "failed": 0,
 4         "skipped": 0,
 5         "successful": 1,
 6         "total": 1
 7     },
 8     "hits": {
 9         "hits": [
10             {
11                 "_id": "e5Je640BETtuYZUFbRgX",
12                 "_index": "user",
13                 "_score": 1.0,
14                 "_source": {
15                     "group": "admin",
16                     "name": "alice"
17                 },
18                 "fields": {
19                     "group_obj": [
20                         {
21                             "name.keyword": "admin",
22                             "role_obj": [
23                                 {
24                                     "name.keyword": "read_internal",
25                                     "policy_obj": {
26                                         "description.keyword": "access to employees data",
27                                         "name.keyword": "read_employees"
28                                     }
29                                 },
30                                 {
31                                     "name.keyword": "read_client_data",
32                                     "policy_obj": [
33                                         {
34                                             "description.keyword": "access to sales data",
35                                             "name.keyword": "read_sales"
36                                         },
37                                         {
38                                             "description.keyword": "access to client data",
39                                             "name.keyword": "read_clients"
40                                         }
41                                     ]
42                                 }
43                             ]
44                         }
45                     ]
46                 }
47             }
48         ],
49         "max_score": 1.0,
50         "total": {
51             "relation": "eq",
52             "value": 1
53         }
54     },
55     "planner": {
56         "is_truncated": false,
57         "node": "YNv9LsxuRwKLH1deIU6Qyw",
58         "timestamp": {
59             "execution": {
60                 "start_in_millis": 1709050978959,
61                 "stop_in_millis": 1709050979215,
62                 "took_in_millis": 256
63             },
64             "job_waiting": {
65                 "start_in_millis": 1709050978868,
66                 "stop_in_millis": 1709050978868,
67                 "took_in_millis": 0
68             },
69             "planning": {
70                 "start_in_millis": 1709050978869,
71                 "stop_in_millis": 1709050978949,
72                 "took_in_millis": 80
73             },
74             "start_in_millis": 1709050978868,
75             "stop_in_millis": 1709050979215,
76             "took_in_millis": 347
77         },
78         "took_in_millis": 347,
79         "type": "static"
80     },
81     "timed_out": false,
82     "took": 41
83 }

Performance considerations

When projecting a single field that has only one value for all documents, then compared to simply projecting that field, using the federate_object runtime field to project it would increase the network load. The reason is that the federate_object runtime field needs to serialize the internal binary representation of the object, which uses therefore some extra bytes.

Although the federate_object runtime field presents this drawback for that simple scenario, it really shines when it projects several fields that are multi-valued. The reason is the Cartesian Product is not computed, thus reducing memory usage and the amount of data transferred across the network. Similarly, the federate_object runtime field presents those benefits when handling one-to-many or many-to-many relationships for the same reason, as can be seen with the running example.

Therefore, whenever you see that a slow join query projecting fields is due to the Cartesian Product (duplicated values is a tell-tale sign), then the federate_object runtime field can help improve the performance.

Conclusion

The federate_object is a powerful new type of runtime field which allows to retain the joins’ structure. It also bypasses the internal Cartesian Product to avoid duplicated data.

3 Likes