This paper deals with automatic classification of documents; this is performed by a supervised classification since it operates on a set of preset classes. The suggested approach is original since it is based on a vector representation of the documents centred not on the words but on the n-grams of characters for n varying from 2 to 5.
Considering the significant number of the n-grams generated for each class, we used in our work the law of χ2 to reduce the number of the characteristic ngrams of each class. The weighting of the vectors was done by using the measurement of the TFIDF, and for the calculation of the distance between two vectors, we used the method of the Cosine. The experiments were done on two well-known corpora in the community of categorization, the Reuter 21578 and the 20Newsgroups. Evaluation of the approach was performed by using a function combining both precision and recall.