Accès chercheur

EEDIS Laboratory

Evolutionary Engineering

and

Distributed Information Systems

Réseaux et Communication

Sécurité et Multimédia

Ingénierie des Connaissances

Data Mining & Web Intelligent

Interopérabilité des Systèmes d’information
& Bases de données

Développement Orienté Service

Experimenting N-Grams in Text Categorization

Auteurs: » Rahmoun Abdellatif
» ELBERRICHI Zakaria
Type : Revue Internationale
Nom du journal : Int. Arab J. Inf. Technol. ISSN:
Volume : 4 Issue: 4 Pages: 377-385
Lien : »
Publié le : 01-10-2007

This paper deals with automatic supervised classification of documents. The approach suggested is based on a vector representation of the documents centred not on the words but on the n-grams of characters for varying n. The effects of this method are examined in several experiments using the multivariate chi-square to reduce the dimensionality, the cosine and Kullback&Liebler distances, and two benchmark corpuses the reuters-21578 newswire articles and the 20 newsgroups data for evaluation. The evaluation was done, by using the macroaveraged F1 function. The results show the effectiveness of this approach compared to the Bag-Of-Word and stem representations.

Tous droits réservés - © 2019 EEDIS Laboratory