Meaningful portrait of documents and Knowledge Base.
Meaningful
portrait of documents
are the formal representation of document texts. Meaningful portraits are
formed by semantics-oriented linguistic processor. Set of meaningful portrait (together with index
files) compose the Knowledge Base (KB) where is provided various types of semantic
search and logical-analytical functions by comparison and transformation of
knowledge structures. We design technology which provides the processing in KB distributed
in net of computers.
Example of
text:
12:16 27.12.2002
In the Chechen Republic one of leaders of bands the
Arabian mercenary Abu-Tarik is destroyed. As have
informed the Ministry of Foreign Affairs of the Chechen Republic, Chechen
special militia destroy the insurgent in settlement Starye Atagi of Groznensky region. In one of the houses there were found
the hiding place with three sub-machine guns.
On some data,
Abu-Tarik was involved in murder of Salikhov's family in Starye Atagi in this year.
Meaningful portrait of the text:
DOC_( 22, “1-02-98.TXT”,“SUMMARY; ” /0+) 0-(ENG)
DATE_(DEC.,~27,12,HOUR,16,MINUTE/1+)
CRIM_GROUP(1,LEADER,OF,BAND,ARABIAN,MERCENARY/2+)
FIO("ABU - TARIK","
"," "," "/3+)
DESTROY(2-,3-/4+) 4-(22,ACT_)
PLACE_(CHECHEN,REPUBLIC/5+)
WHERE(4-,5-)
ORGANIZATION_(MINISTRY,OF,FOREIGN,AFFAIRS,OF,CHECHEN,
REPUBLIC/6+)
INFORM(6-/7+)
7-(22,ACT_)
FORCE_(SPECIAL,MILITIA/8+)
DESTROY(CHECHEN,8-,INSURGENT/9+) 9-(22,ACT_)
PLACE_(SETTLEMENT,STARYE,ATAGI,OF,GROZNENSKY,REGION/10+)
WHERE(9-,10-)
WEAPON_("SUB ",MACHINE,GUN/11+)
FIND(1,HOUSE,HIDE,PLACE,3,11-/12+) 12-(22,ACT_)
PLACE_(STARYE,ATAGI/13+)
INVOLVE(3-,MURDER,SALIKHOV,FAMILY,13-,YEAR/14+) 14-(22,ACT_)
SENTENCE_(22,1-/15+) 15-(1,1,19)
SENTENCE_(22,4-/16+) 16-(1,20,114)
SENTENCE_(22,7-,9-/17+) 17-(2,115,288)
SENTENCE_(22,12-/18+) 18-(5,289,376)
SENTENCE_(22,ON,SOME,DATA,14-/19+) 19-(6,377,476)
A meaningful portrait consists of the
elementary fragments, arguments of which are words in the normal form
(necessarily for the search and processing). Each elementary fragment has its
unique code, which is written in the form of the number with the sign + and is
separated by a slash line. For example, in the fragment FIO("ABU
- TARIK"," "," "," "/3+) the sign 3+ is its
code (but 3- is the reference to it). Fragments DOK_(22,
“1-02-98.TXT”, “SUMMARY; ” /0+)
0-(ENG)
indicate that the meaningful portrait is built on the basis of the English-language
text of document with number 22 of the file of 1-02-98.TXT”, which was
processed as the summary of the incidents (linguistic knowledge depend on
this). The following fragments present date DATE_(…/1+),
criminal group CRIM_GROUP(…/2+), person’s surname (name and patronymic) FIO(… /3+)
and so forth. The signs 0+,0- and 1+,1- and
2+,2- and 3+, 3-, … are the codes of the fragments, with the aid of
which their connections and relations are assigned. Actions are represented in
the form of fragments of the type DESTROY(2-,3-/4+) 4-(22,ACT_), where it is represented that “criminal group (CRIM_GROUP with code 2+)
and person (FIO with code 3+), are destroyed”. With the aid of it is
the fragment 4-(22, ACT_) indicates that the first fragment is DESTROY(…./4+) presents the action and relates to the
document with the number 22. Fragments PLACE_(CHECHEN,REPUBLIC/5+)
WHERE(4-,5-) indicate the place of this action (WHERE). Fragments ORGANIZATION_(…/6+) INFORM(6-/7+)
7-(22,ACT_) represent that “organization
… was informed”.
Special role is played by the fragments
PREDL_(...), which correspond to the sentences. They
are filled up with the words, which did not enter the information objects (in
this example they are absent), or with the codes of objects themselves.
To these fragments the indicators of
their position in the text are added. For example, the fragment SENTENCE_(22,7-,9-/17+)
17-(2,115,288) represents the fact that the objects with codes 7-
(corresponding to the action “inform”),
9- (corresponding the action “destroy”
are located in the sentence, which begins from the 2nd line of the text of the
document and they occupy the place from the 115-th to the 228-th byte. These
means of positioning are necessary for the work of the reverse linguistic
processor.
Set
of meaningful portraits of documents are organized in Knowledge Base. Logical reference is provided
with the aid of the rules IF… THEN (productions) of the language DECL, which
are the basis for decision of logical-analytical tasks.
Graph
of meaningful portrait:
On
this graph the upper node corresponds the document.
Central node presents the figurant Abu-Tarik. Left
node corresponds the organized criminal group and so on. Nodes with letter A corresponds the
actions. The arcs present connection and relation between named entities (NE).
Arcs, connected nodes (corresponding named entities) with nodes A, present
that the actions includes the named entities .