Knowledge Extraction from texts in English and Russian


Knowledge Extraction is more actual task in fields Linguistics and Informatics. Example of extraction is shown on following picture:



A tremendous increase of the documents flow, obtained by the users through different information channels (including the Internet), requires new solutions. The large part of such documents exists in the form of natural language texts (NL). In many cases one cannot read and comprehend even the smallest portion of the factual information available. The existing information means can render assistance, but for this a preliminary formalization is required. At the same time the majority of end users are people interested in specific subject things. For example, a criminal inspector seeks to extract information on important figurants, their places of residence, telephones, criminal events, dates and other such facts; a personnel manager is interested in the organizations, when and where a person worked and in what position. Other people try to fish out from the media the information about the countries, important persons, catastrophes, etc. We call this concrete information interesting for a user the named entities (same is information objects).

Hence follows the need for constructing a new class of information systems, which would consider the interests of the end user and be oriented at extracting named entities (information objects) from texts. At present this problem is in the focus of attention of many researchers and developers.

In this article a class of such systems is presented, based on the use of special linguistic processors (LP) and technology of knowledge bases (KB). Linguistic processors are necessary for the deep processing of texts with the extracting the named entities, their connections and participation in action. As result the structures of the knowledge in KB are formed. We call such processors semantics-oriented. Their special feature is the employment of the linguistic knowledge (LK), organized in such a way as to consider lexical and semantic special features of natural language with the formation of the knowledge structures. At the level of KB it is possible more fully to decide the users tasks.