Motivating Example |
[This is preliminary documentation and is subject to change.]
In this example you can see a typical lifecycle of an Entity that was added to the Semantic Pipeline, and further enhanced by the Zet Universe built-in and third-party plug-ins.
Although the following example highlights entity extraction from text documents, the system as it is envisaged is capable of extracting arbitrary features from any binary stream.
Below is a somewhat artificial example of a set of information processors that may run as a new document is added to the system:
The File System Watcher in the plugin Local Folders ZApp writes an entry containing a file reference to the "/Incoming/Files/Generic/" topic.
The File Kind Extraction processor is notified about a new file, and maps the given document to a known Kind by its extension, and publishes this entity to the "/Classifications/Kinds/". At the same time, the Full-Text Extraction processor is also notified about a new file, and it extracts full text from the file, and publishes this entity to the "/Content/FullText/".
The Local Folders ZApp plugin checks file contents of the newly added file and sends its corresponding entity for full-text extraction to the "/Incoming/Files/Documents/" topic.
The Full-Text Extraction processor is notified about a newly added entity with an associated Document file, extracts its full-text and publishes entity to the "/Properties/FullText" topic.
The Keyphrase Extraction processor is notified about a newly extracted full-text posted to the "/Properties/FullText" topic in the Semantic Pipeline, then it calculates and extracts high-frequency keyphrases in the document, and then publishes the Entity to the "/Properties/Keyphrases/" topic so that other processors could analyze the updated keyphrases.
The Entity Extraction processor is notified about a newly extracted full-text posted to the "/Properties/FullText" topic in the Semantic Pipeline, and extracts mentioned Entities in the document, and then publishes the Entity to the "/Relationships/" topic so that other processors could analyze the updated relationships.
Incoming Topic | Description | Plugin | Outgoing Topic |
---|---|---|---|
New file found, but it's kind is unknown | Local Folders ZApp | /Incoming/Files/Generic/ | |
/Incoming/Files/Generic/ | File extension should be mapped to a known Kind | File Kind Extractor | /Classification/Kinds/ |
/Incoming/Files/Generic/ | Entity links to a file, it's contents should be checked for full-text | Local Folders ZApp | /Incoming/Files/Documents/ |
/Incoming/Files/Documents/ | Entity links to a document, text should be extracted | Full-Text Extractor | /Properties/FullText/ |
/Properties/FullText/ | Keyphrases should be extracted from the full text | Keyphrase Extractor | /Properties/Keyphrases/ |
/Properties/FullText/ | Entities should be extracted from the full text | Entity Extractor | /Relationships/ |