Auteurs: | » Benamar HAMZAOUI » BOUCHIHA Djelloul » BOUZIANE Abdelghani | |
Type : | Revue Internationale | |
Nom du journal : | ISSN: | |
Volume : | Issue: | Pages: |
Lien : » https://ojs.brazilianjournals.com.br/ojs/index.php/BJT/article/view/77611 | ||
Publié le : | 17-02-2025 |
The exponential growth of textual data has heightened the importance of efficient text classification, a fundamental natural language processing task that assigns predefined categories to documents. This task can follow flat classification, where categories are equally structured, or hierarchical classification, which organizes categories in multi-level structures and presents additional complexities. While extensive research has advanced text classification for English, studies on Arabic text classification remain limited, particularly in hierarchical contexts. The unique features of Arabic, such as its rich morphology, diverse dialects, and syntactic complexity, pose significant challenges. This survey provides a comprehensive review of Arabic text classification by examining data sources, preprocessing steps, and feature extraction techniques, ranging from traditional methods like Bag of Words and TF-IDF to modern approaches such as neural embeddings (e.g., Word2Vec) and transformer-based models like BERT. Additionally, it explores classification techniques, from machine learning algorithms (e.g., SVM, Random Forest) to deep learning models (e.g., CNN, RNN, LSTM, GPT), and evaluates performance through metrics such as precision, recall, and F1-score. This survey aims to guide future research and innovation in Arabic text classification by addressing current advancements and challenges.