Accès chercheur

EEDIS Laboratory

Evolutionary Engineering

and

Distributed Information Systems

Réseaux et Communication

Sécurité et Multimédia

Ingénierie des Connaissances

Data Mining & Web Intelligent

Interopérabilité des Systèmes d’information
& Bases de données

Développement Orienté Service

A comprehensive survey on arabic text classification: progress, challenges, and techniques


Auteurs:	» Benamar HAMZAOUI » BOUCHIHA Djelloul » BOUZIANE Abdelghani
Type :	Revue Internationale
Nom du journal :	ISSN:
Volume :	Issue:	Pages:
Lien : » https://ojs.brazilianjournals.com.br/ojs/index.php/BJT/article/view/77611
Publié le :	17-02-2025

The exponential growth of textual data has heightened the importance of efficient text classification, a fundamental natural language processing task that assigns predefined categories to documents. This task can follow flat classification, where categories are equally structured, or hierarchical classification, which organizes categories in multi-level structures and presents additional complexities. While extensive research has advanced text classification for English, studies on Arabic text classification remain limited, particularly in hierarchical contexts. The unique features of Arabic, such as its rich morphology, diverse dialects, and syntactic complexity, pose significant challenges. This survey provides a comprehensive review of Arabic text classification by examining data sources, preprocessing steps, and feature extraction techniques, ranging from traditional methods like Bag of Words and TF-IDF to modern approaches such as neural embeddings (e.g., Word2Vec) and transformer-based models like BERT. Additionally, it explores classification techniques, from machine learning algorithms (e.g., SVM, Random Forest) to deep learning models (e.g., CNN, RNN, LSTM, GPT), and evaluates performance through metrics such as precision, recall, and F1-score. This survey aims to guide future research and innovation in Arabic text classification by addressing current advancements and challenges.