Auteurs: | » BENNABI Sakina Rim » ELBERRICHI Zakaria |
Type : | Chapitre de Livre |
Edition : Proceedings of the 10th Intern | ISBN: |
Lien : » https://doi.org/10.1145/3447568.3448531 | |
Publié le : | 04-06-2020 |
Feature selection is a method of data pre-processing widely used when mining large data, such as textual classification. Several studies have been conducted to compare the different methods of feature selection applied to corpora in English. Unfortunately, a small number of works concern the Arabic language. This article aims to present a comparative study of different feature selection techniques including: Chi2, the ANOVA method and mutual information, applied on a corpus in Arabic language, while also diversifying the machine learning algorithms (Naive Bayes, SVM and KNN). This experimental study has shown in general that reducing dimensionality with feature selection techniques has slightly affected the performance of textual classification, reducing the size of the corpus by up to 1%.