Siren NLP plugin configuration

Hi,

I’m trying to add two Siren NLP ingest processors (NER & Taxonomy) to a pipeline but am getting the error shown below. Any help will be much appreciated!

{

    "error": {

        "root_cause": [
            {
                "type": "exception",
                "reason": "java.lang.IllegalArgumentException: Cannot deserialize instance of `java.lang.String` out of START_ARRAY token\n at [Source: UNKNOWN; line: -1, column: -1] (through reference chain: uk.gov.nca.annot8.components.opennlp.processors.NER$Settings[\"nerType\"])",
                "processor_type": "siren-nlp"
            }
        ],

        "type": "exception",
        "reason": "java.lang.IllegalArgumentException: Cannot deserialize instance of `java.lang.String` out of START_ARRAY token\n at [Source: UNKNOWN; line: -1, column: -1] (through reference chain: uk.gov.nca.annot8.components.opennlp.processors.NER$Settings[\"nerType\"])",

        "processor_type": "siren-nlp",
        "caused_by": {
            "type": "illegal_argument_exception",
            "reason": "Cannot deserialize instance of `java.lang.String` out of START_ARRAY token\n at [Source: UNKNOWN; line: -1, column: -1] (through reference chain: uk.gov.nca.annot8.components.opennlp.processors.NER$Settings[\"nerType\"])",
            "caused_by": {
                "type": "mismatched_input_exception",
                "reason": "Cannot deserialize instance of `java.lang.String` out of START_ARRAY token\n at [Source: UNKNOWN; line: -1, column: -1] (through reference chain: uk.gov.nca.annot8.components.opennlp.processors.NER$Settings[\"nerType\"])"
            }
        }
    },
    "status": 500

}

Siren-nlp Configuration
{

    "processors": [
        {
            "siren-nlp": {
                "fields": [
                    "title",
                    "snippet"
                ],
                "processors": [
                    {
                        "class": "Taxonomy",
                        "settings": {
                            "index": "cars",
                            "idField": "id",
                            "preferredTermField": "preferred_term",
                            "synonymFields": [
                                "synonyms"
                            ],
                            "parentsField": "parents",
                            "caseSensitive": false,
                            "exactWhitespace": false,
                            "plurals": true,
                            "type": "taxonomy-cars",
                            "additionalData": true
                        }
                    },
                    {
                        "class": "NER",
                        "settings": {
                            "nerType" : [
                                "Location", "Organization", "Person"
                            ]
                        }
                    }
                ]
            }
        }
    ]
}

For the NER processor the nerType must be a single string : Location, Organization or Person.
To have multiple NER types in a pipeline, use multiple NER processors, each with a different nerType.

2 Likes

It works like a charm now, thanks!