RQ1: Domains & Tasks
What Islamic knowledge domains and application tasks have been operationalized in ML/AI systems, and how is this work distributed across subfields?
A systematic scoping review examining how AI systems operationalize Islamic knowledge across 160+ papers (2016-2026)
AI systems are increasingly mediating how Islamic communities access, study, and apply Islamic sources; still, research on Islamic-knowledge capabilities remains fragmented across NLP, information retrieval, speech, multimodal learning, educational technology, and recent LLM alignment work.
This survey presents a critical systematic review of 160+ papers from the past decade that incorporate Islamic knowledge in Machine Learning/AI. We propose a layered taxonomy that separates an epistemic view of Islamic knowledge (authority-bearing foundations and established disciplines) from an instrumental AI task layer (data and corpora, retrieval and grounding, understanding, reasoning support, evaluation and governance, and multimodal methods), while treating normative concerns as cross-cutting constraints.
Using this framework, we synthesize trends in datasets, benchmarks, and system architectures, highlighting the shift toward retrieval-grounded LLM pipelines, verification and deferral mechanisms, and emerging multimodal recitation and manuscript-processing systems.
We also consolidate evaluation practices for trustworthiness, including provenance and faithfulness, disagreement-aware and school-of-thought-sensitive framing, calibrated abstention under underspecified queries, and safety and bias assessment for Islamic contexts. Finally, we identify deployment-critical gaps and engineering priorities for building auditable, pluralism-aware, and risk-sensitive Islamic-knowledge AI.
AI systems are increasingly mediating how Islamic communities access, study, and apply Islamic sources. With over 2 billion Muslims worldwide, the demand for reliable, culturally grounded AI tools for Islamic knowledge has never been greater. Yet research on Islamic-knowledge capabilities remains fragmented across NLP, information retrieval, speech processing, multimodal learning, educational technology, and LLM alignment.
This systematic survey reviews over 160 papers from the past decade (2016–2026), presenting a critical analysis of how AI systems operationalize Islamic knowledge. We follow the PRISMA-ScR framework to ensure transparency and reproducibility, screening 1,743 initial records down to 160 included studies.
We introduce a layered taxonomy that separates an epistemic view of Islamic knowledge—authority-bearing foundations (Qur'an, Hadith) and established disciplines (Qur'anic Sciences, Hadith Sciences, Usul al-Fiqh, Fiqh, Theology, Sufism, and History)—from an instrumental AI task layer covering data and corpora, retrieval and grounding, understanding, reasoning support, evaluation and governance, and multimodal methods.
Three cross-cutting normative dimensions thread through the entire analysis: (i) doctrinal integrity and authenticity (correct attribution, protection against fabrication), (ii) normative correctness and disagreement handling (school-aware framing, calibrated abstention), and (iii) objectives, harms, and governance (maqasid-informed alignment, bias checking, deployment safety).
Our goal is to provide a comprehensive view of how current AI systems capture, structure, and evaluate Islamic knowledge—identifying where they resonate with scholarly practice and where they risk flattening diverse traditions and worldviews.
Prior surveys on Islamic AI systems have focused on specific domains. This survey provides a comprehensive view across all aspects:
| Prior Survey | Year | Primary Scope | Main Research Areas | Prov./Disagr./Abst. | MM | Norm./Plur. | Systematic Method |
|---|---|---|---|---|---|---|---|
| Azmi (2019) | 2019 | Hadith-focused | IR/ML/DL (pre-LLM) | ~ | ✗ | ~ | Narrative (no PRISMA) |
| Bashir et al. (2023) | 2023 | Qur'an-focused | IR/ML/DL; limited Transformers; pre-LLM/RAG | ~ | ✓ | ✗ | Inclusion/exclusion + flow |
| Alnefaie et al. (2023) | 2023 | Islamic QA (Qur'an/Hadith/Fatwa) | Retrieval + PLMs; task framing | ~ | ✗ | ~ | Survey + evaluation criteria |
| Ahmad et al. (2025) | 2025 | Qur'anic education | AI-in-education (systematic review) | ✗ | ✗ | ~ | Systematic review |
| Hakim et al. (2023) | 2023 | Teaching Islamic studies | AI-in-education (systematic review) | ✗ | ✗ | ~ | Review (PRISMA referenced) |
| Alhammad et al. (2025) | 2025 | Islamic education | Pedagogy + learning outcomes (review) | ✗ | ✗ | ~ | Review |
| Mashaabi et al. (2024) | 2024 | Arabic LLMs (general) | Transformers/LLMs; resources | ✗ | ✗ | ~ | Method + openness/resources |
| Rhel et al. (2025) | 2025 | Arabic LLMs (general) | PLMs/LLMs; prompting; evaluation trends | ✗ | ✗ | ✗ | Review-style |
| Abouzied et al. (2025) | 2025 | Arabic LLMs landscape | LLMs + benchmarks; harm/bias themes | ~ | ✗ | ~ | Landscape (no PRISMA) |
| Alzubaidi et al. (2025) | 2025 | Arabic LLM benchmarks | LLM evaluation; benchmark taxonomy | ~ | ✗ | ~ | Systematic review |
| Asseri et al. (2025) | 2025 | Bias (Arabs/Muslims) | Prompting/pipelines; bias measurement | ✗ | ✗ | ✓ | PRISMA |
| This Survey (2026) | 2026 | Whole Islamic stack | IR, Transformers, LLM/RAG/agents; end-to-end evaluation | ✓ | ✓ | ✓ | PRISMA-ScR + reproducibility artifacts |
Legend: ✓ = explicit focus; ~ = partial/implicit; ✗ = not covered. MM = multimodal; Norm./Plur. = normative/pluralism considerations; Prov./Disagr./Abst. = provenance/disagreement/abstention.
To ensure a systematic, transparent, and reproducible review, we followed the PRISMA-ScR framework. This approach allowed us to rigorously map the rapidly evolving research landscape while minimizing selection bias.
What Islamic knowledge domains and application tasks have been operationalized in ML/AI systems, and how is this work distributed across subfields?
What datasets, benchmarks, and knowledge resources are available, and what assumptions do they encode about evidence, provenance, and interpretive diversity?
How do studies evaluate trustworthiness, especially source faithfulness, doctrinal correctness, pluralism-aware answering, and safety/bias?
Systematic search across Semantic Scholar, IEEE Xplore, ACM Digital Library, ACL Anthology, and arXiv. Initial corpus: 1,743 papers → 160 included after screening.
We developed a comprehensive two-layer taxonomy that bridges traditional Islamic sciences with modern AI evaluation. The epistemic layer organizes Islamic knowledge to prevent a common failure mode—treating all "Islamic text" as homogeneous content—by distinguishing sources that carry direct normative authority from disciplines that regulate how those sources are read, reconciled, and applied. The AI task layer is explicitly instrumental, enumerating reusable computational capabilities (retrieval, grounding, extraction, reasoning, evaluation) that apply across domains.
Hover over nodes to explore the hierarchical structure of Islamic AI research domains
Digitisation (OCR/HTR for manuscripts), corpus building, annotation schema design, versioning for provenance and reproducibility.
Search/IR with cross-lingual support, RAG and evidence linking, provenance tracking, citation generation, quote-bounded answering.
NER, relation/event extraction, geo-temporal linking, faithful summarisation, and multi-source synthesis that surfaces disagreements.
Comparative (madhhab-aware, ikhtilaf surfacing), uncertainty handling (abstention/deferral, clarification prompts).
Benchmarks, error taxonomies, reproducibility, hallucination control, fabrication detection, bias checks, and red-teaming assessment.
Speech/audio (ASR, tajwid, recitation coaching), document image processing (manuscript OCR/HTR, layout analysis, segmentation).
Islamic-knowledge AI is deployed in recurring settings that stress different reliability constraints:
Focus: Qur'an/Hadith search, reference assistants
Key risks: Provenance breakage, source fabrication, citation hallucination
Requirements: Source faithfulness, verifiable attribution
Focus: Jurisprudential reasoning, school-aware rulings
Key risks: Collapsing legitimate disagreement (ikhtilāf), underspecified answers
Requirements: Multi-school awareness, qualified responses, abstention policies
Focus: Hajj/Umrah guidance, prayer times, rituals
Key risks: Operational errors in high-stakes contexts
Requirements: Conservative abstention, traceable sourcing, clear disclaimers
Focus: Recitation coaching, tajwīd feedback, OCR
Key risks: Pronunciation errors, incorrect tajwīd rules
Requirements: Robust speech pipelines, expert validation
These use cases highlight the diversity of deployment contexts and the varying trustworthiness requirements across Islamic-knowledge AI applications.
Islamic alignment requires structured corpora spanning primary texts, classical scholarship, contemporary legal resolutions, linguistic resources, and benchmark datasets. We organize these into five major categories:
Qur'an: Tanzil, Quranic Arabic Corpus
Hadith: Kutub al-Sittah (Six Books), authenticated collections
Key features: Canonical texts, linguistic annotation, chain of narration
Content: Uṣūl al-Fiqh, legal maxims, Maqāṣid texts
Examples: al-Ghazālī's works, al-Shāṭibī's al-Muwāfaqāt
Resources: OpenITI, digitized libraries
Sources: IIFA, ECFR, MWL Fiqh Council, AMJA
Topics: Finance, bioethics, minority jurisprudence
Purpose: Connect classical principles to modern issues
Tools: Arabic morphological analyzers, embeddings
Data: Quranic Arabic Corpus, narrator databases
Models: AraBERT, CAMeLBERT, specialized embeddings
Datasets: QIAS, IslamicEval, PalmX, FiqhQA
Multimodal: Iqra'Eval (recitation), AraHalluEval
Focus: Correctness, faithfulness, safety
Key issues in Islamic knowledge data resources:
Overview of resources related to Islamic AI, categorized by Classical Pre-training Corpora, Knowledge Bases (KB) for Retrieval, and Benchmarks
| Dataset | Description | Type | Size | License | Lang | Ref. |
|---|---|---|---|---|---|---|
| Classical Pre-training Corpora (Turath) | ||||||
| OpenITI | The largest machine-readable corpus of pre-modern Islamicate texts. | Corpus | ~1B tokens | CC-BY-4.0 | Ar/Per | [1] |
| Shamela (Cleaned) | Text version of Shamela library; covers Fiqh, Tafsir, and History. | Corpus | ~1B Tokens | Public | Ar | [2] |
| KSUCCA | King Saud Univ. Corpus of Classical Arabic (7th–11th Century CE). | Corpus | 50M Tokens | Research | Ar (Cls) | [3] |
| Tashkeela | Fully vocalized classical texts for training diacritic-aware models. | Corpus | 75M Tokens | Public | Ar | [4] |
| Noor Corpus | Massive diverse library of Islamic PDFs and OCR'd texts. | Corpus | >100k Books | Mixed | Ar | [5] |
| Knowledge Bases & Ontologies | ||||||
| QuranMorph Corpus | Morphologically annotated Quranic corpus with POS, Lemmatization, etc. | KB | 77k Tokens | CC-BY-4.0 | Ar/En | [6] |
| Quranic Arabic Corpus | Morphological and syntactic ontology mapping concepts in the Quran. | KB/Onto | 77k nodes | Research | Ar/En | [7] |
| IslamicPCQA KB | Curated knowledge base of 1M+ Islamic documents for retrieval. | KB | 1M+ Docs | Research | Fa/Ar | [8] |
| Sunnah.com API | Structured Hadith collections with grading (Sahih/Hasan) metadata. | API/KB | 6 Major Books | Open | Ar/En | [9] |
| Quranic & Hadith Resources | ||||||
| IslamicEval 2025 | Hallucination detection-Quran/Hadith. | Hallu | 1,506 Questions | Apache 2.0 | Ar | [10] |
| Iqra'Eval 2025 | Quranic mispronunciation diagnosis. | Audio | 82+ Hours | Research | Ar | [11] |
| Sanadset 650K | Hadith dataset with narrator-chain (Sanad). | Corpus | 650,986 Records | CC-BY-4.0 | Ar | [12] |
| QURAN-MD | Unified verse-level text, translation, transliteration, and 32 reciters. | Multi | 6,236 Verses | Research | Ar/En | [13] |
| Jurisprudence (Fiqh) & Reasoning | ||||||
| QIAS 2025 | Inheritance & general knowledge. | Reason | 22,000 MCQs | Research | Ar | [14] |
| FiqhQA | QA by 4 Sunni schools; abstention eval. | QA | 960 QAs | Research | Ar/En | [15] |
| IslamicPCQA | QA with 1M+ doc knowledge base. | QA/RAG | 12k Pairs | Research | Fa | [16] |
| Hajj-FQA | Specialized QA for Hajj rituals/fatwas. | QA | 886 Hajj-fatwas | Research | Ar | [17] |
| Cultural & Ethical Alignment | ||||||
| PalmX 2025 | General Arabic & Islamic Culture. | Culture | 6.4k MCQs | Shared Task | Ar | [18] |
| BengaliMoralBench | Moral reasoning-Bengali Islamic culture. | Ethics | 3k Scenarios | CC-BY-NC-ND | Bn | [19] |
| IslamTrust | Alignment benchmark with consensus-based Islamic ethical principles. | Ethics | Multi | Research | Ar/En | [20] |
| ADAB | App reviews corpus annotated for politeness and religious etiquette. | Style | Curated | Research | Ar | [21] |
Note: QA=Question Answering, Reason=Reasoning/Math, Hallu=Hallucination Detection, Audio=Speech/Recitation, KB=Knowledge Base, Multi=Multimodal
Evaluation of Islamic-domain LLMs cannot be reduced to generic NLP benchmarks. The core question is not fluency but whether a system is safe and warranted: does it ground claims in authoritative evidence, preserve provenance, and refrain when evidence is missing or contested?
Comprehensive evaluation framework for Islamic AI systems across four primary domains
| Methodology | Metrics & Description | Key Benchmarks | |
|---|---|---|---|
| Linguistic & Reasoning Capabilities | |||
| N-gram Matching | BLEU, ROUGE, METEOR: Measures lexical overlap with reference text. Limited utility for theological nuance. | OALL [22], AraGen [23] | |
| Symbolic Verification | Execution Accuracy: Validates mathematical derivations (e.g., inheritance shares) against rule-based engines. | QIAS [14], GATMath [24] | |
| Standardized Testing | Normalized Accuracy: Performance on multiple-choice questions (MCQs) across diverse subjects (STEM, Humanities). | ArabicMMLU [25], QuranBench [26] | |
| Retrieval & Grounding (RAG) | |||
| Factuality Checking | Span-Level Error Rate: Percentage of generated text spans containing fabricated content. | Halwasa [27], HalluVerse [28] | |
| Citation Verification | Citation Precision/Recall: Accuracy of retrieving specific Qur'anic Ayahs or Hadith to support a claim. | IslamicEval [10], Hajj-FQA [17] | |
| Entailment | Faithfulness Score: Measures if the generated answer is logically entailed by the retrieved context. | FARSIQA [16], AraHalluEval [29] | |
| Doctrinal & Cultural Alignment | |||
| Scholar-in-the-Loop | Adjudication Score: Human expert evaluation of Fatwa correctness, Adab (etiquette), and Hikmah (wisdom). | FiqhQA [15], Iqra'Eval [11], ADAB [21] | |
| Cultural Probing | Alignment Score: Degree of conformity to Arab-Islamic norms vs. Western-centric values. | IslamTrust [20], PalmX [18] | |
| Dialectal Robustness | Dialectal Accuracy: Performance consistency across Modern Standard Arabic (MSA) and regional dialects. | AraDiCE [30], Absher [31] | |
| Safety & Governance | |||
| Red Teaming | Attack Success Rate (ASR): Vulnerability to prompt injection or generating prohibited content (e.g., hate speech). | ASAS [32], AraTrust [33] | |
| Mechanistic Analysis | Latent Activation: Identification of internal neuronal pathways associated with bias or violence. | Simbeck et al. [34] | |
Note: Evaluation methodologies categorized into four primary domains: general capabilities, retrieval & grounding, doctrinal faithfulness, and safety & ethics. Each methodology includes specific metrics and corresponding benchmarks for comprehensive assessment.
Trustworthy Islamic-knowledge AI requires multi-dimensional evaluation that goes beyond accuracy metrics to include provenance verification, pluralism awareness, cultural sensitivity, and governance-oriented assessment. Standard NLP benchmarks are insufficient for high-stakes religious guidance applications.
Methods for operationalizing Islamic knowledge in AI have evolved through three overlapping eras. The LLM era does not replace earlier techniques so much as subsume them—lexical retrieval and symbolic resources reappear inside RAG and verification pipelines.
Pre-Transformer & Early Neural
Neural Encoders & Arabic LMs
Generative LLMs, RAG & Agentic Pipelines
Despite significant progress, Islamic-knowledge AI faces critical challenges that require interdisciplinary solutions. Our survey identifies six deployment-critical gaps and engineering priorities:
Building auditable, pluralism-aware, and risk-sensitive Islamic-knowledge AI requires:
@article{bhatia2026islamic,
title={Advances in AI Systems on Islamic Knowledge Capabilities: A Critical Survey},
author={Bhatia, Gagan and Mubarak, Hamdy and Hawasly, Majd and Jarrar, Mustafa and
Mikros, George and Zaraket, Fadi and Alhirthani, Mahmoud and Al-Khatib, Mutaz and
Cochrane, Logan and Darwish, Kareem and Yahiaoui, Rashid and Alam, Firoj},
year={2026},
institution={Qatar Computing Research Institute, HBKU}
}