Logical Analytical System “Criminal”

      

    The System uses new technologies:

    - Semantic-oriented Linguistic Processor for knowledge extraction from texts;

    - Knowledge Base on Extended Semantic Networks for task decision. 

 

The flows of documents in the criminal police comprise the summaries of incidents, information on the criminal cases, accusatory conclusions, etc. In these documents much concrete information is contained which concerns figurants, their acts, the instruments of crime and other facts. The basic tasks are different forms of search. Note that monthly accumulated volumes of new information of this type comprise tens and hundreds of megabytes. No one can read all this and hold it in the head. The full-text data bases do not solve this problem, since working with the natural language (NL) texts they produce much noise (excessive documents) and significant loss of information. The reason for this is a special feature of the Russian language: the free order of words. The words relevant for the query can be scattered in the text of a document and relate to different entities. For eliminating these deficiencies the criteria of words proximity are introduced, they cut the endings of word forms (normalization process) and carry out the indexing of the normalized words, however, this does not radically solve the problem.

Another approach is the use of relational data bases. But for this the labor-consuming work of specially trained people is required on formalization of NL texts: extraction from the documents (incident descriptions) of persons, addresses, dates... and filling the corresponding tables in a data base. It is extremely difficult to make this with the large flows of documents.

For this task the system "Criminal" was developed at the end of the 90-ies. Its special feature is automatic analysis of text with the extraction of the necessary collection of information objects. The "Criminal" system was verified on 500 thousand incidents from the summaries of Moscow Criminal Police Office (GUVD), and it showed the unique results on the basic objects: coefficient of noise (excessive words in the objects) was not more than 1-2% and losses were not more than 3%.

The following basic objects must be singled out (with minimum loss):

persons (by family name, given name and patronymic - FNP) with their role features (criminal, victim);

the verbal description of the persons, their distinctive signs;

address, posting information attributes;

date(s) mentioned;

weapon with its special features;

telephone numbers, faxes, e-mails with their subsequent standardization;

the means of transport with the indication of the vehicle type, its state number, color and other attributes;

passport data and other documents with their attributes;

explosives and narcotic substances;

police departments;

the police officers.

Secondary objects (their loss is less fatal):

organizations;

positions;

quantitative characteristics (how many persons or other objects participated in an event);

the numbers of accounts, sums of money with the indication of the currency type.

Connections:

event (criminal, terrorist, breakdown of articles and so on) with the indication of the information objects participation in them;

time and the place of events;

• the connection between different types of information objects (with whom a person works in an organization, or lives at the same address, in what events participated together with other objects, etc.).

Some difficulties of the objects extraction from texts consist in the following. First, the difficulties, connected with the special features of the Russian language. These are the free order of words, the presence of homonymy and polysemy, the variety of language forms for expression of one and the same meaning (synonymy). For example, any event can be expressed with the aid of the verbal forms, verbal nouns, participial constructions, etc. they must be reduced to one form.

Second, the presence (especially in the summaries of incidents) of a large number of reductions, which must be deciphered via the analysis of context. For example, g. can indicate YR, CITY, STATE. and other.

Third, there are many omissions. For example, after a figurant the address is written, year of birth and other data. They must be connected with the figurant.

An important task is the identification of objects (figurants) in the entire text, the use for these purposes of indicative pronouns, brief names, anaphoric references. This is especially necessary for the accusatory conclusions (verdicts), where one and the same person is mentioned repeatedly (by different methods of naming) throughout the entire document. Taking into account the difficulties and in accordance with the tasks the linguistic processor of the "Criminal" system was developed, which achieves normalization of words, their grouping with the formation of units, the identification of objects and the establishment of connections. As a result for each NL document a semantic network called the meaningful document portrait was constructed automatically. The latter are the knowledge structures of the knowledge base which serve the basis for implementing different forms of semantic search : the search by features and connections, the search for the objects connected at different levels, the search for similar figurants and incidents, the search by distinctive signs (with the use of ontology).

The expert component is supported for the classification of incidents by the catalogs of the criminal police: the "form of crime", the "method of the accomplishment of crime" and others. The result is introduced into the meaningful portrait. There is a complete set for tuning to the subject area.

    System " Ñriminal" provides (by methods of structural processing) the solution of following logical-analytical problems:

- searching the similar incidents and figurants according to the information in KB;

- searching the figurants by verbal portrait;

- answer to questions in NL (Russian);

- explanation of the search results;

- analysis and mapping the connections between the figurants;

- estimation of the degree of the participation of figurants in the incident;

- ordering  figurants according to the degree of their criminal activity;

- discovery of the organized criminal groups;

- statistical processing of information to estimate the dynamics of the criminal processes in time.