Knowledge Extraction from texts in English and
Russian
Knowledge Extraction is more actual task in fields “Linguistics” and
‘Informatics’. Example of extraction is shown on following picture:
A tremendous increase of the
documents flow, obtained by the users through different information channels
(including the Internet), requires new solutions. The large part of such
documents exists in the form of natural language texts (NL). In many cases one
cannot read and comprehend even the smallest portion of the factual information
available. The existing information means can render assistance, but for this a
preliminary formalization is required. At the same time the majority of end
users are people interested in specific subject things. For example, a criminal
inspector seeks to extract information on important figurants, their places of
residence, telephones, criminal events, dates and other such facts; a personnel
manager is interested in the organizations, when and where a person worked and
in what position. Other people try to fish out from the media the information
about the countries, important persons, catastrophes, etc. We call this
concrete information interesting for a user the named entities (same is information objects).
Hence follows the need for constructing a new class of
information systems, which would consider the interests of the end user and be
oriented at extracting named entities (information objects) from texts. At
present this problem is in the focus of attention of many researchers and
developers.
In this article a class of such systems is presented,
based on the use of special linguistic processors (LP) and technology of
knowledge bases (KB). Linguistic processors are necessary for the deep
processing of texts with the extracting the named entities, their connections
and participation in action.
As result the structures of the knowledge in KB are formed. We call such
processors semantics-oriented. Their special feature is the employment
of the linguistic knowledge (LK), organized in such a way as to consider
lexical and semantic special features of natural language with the formation of
the knowledge structures. At the level of KB it is possible more fully to
decide the user’s tasks.