Advances in AI Systems on Islamic Knowledge Capabilities: A Critical Survey

A systematic scoping review examining how AI systems operationalize Islamic knowledge across 160+ papers (2016-2026)

Publication trends showing research growth

Abstract

AI systems are increasingly mediating how Islamic communities access, study, and apply Islamic sources; still, research on Islamic-knowledge capabilities remains fragmented across NLP, information retrieval, speech, multimodal learning, educational technology, and recent LLM alignment work.

This survey presents a critical systematic review of 160+ papers from the past decade that incorporate Islamic knowledge in Machine Learning/AI. We propose a layered taxonomy that separates an epistemic view of Islamic knowledge (authority-bearing foundations and established disciplines) from an instrumental AI task layer (data and corpora, retrieval and grounding, understanding, reasoning support, evaluation and governance, and multimodal methods), while treating normative concerns as cross-cutting constraints.

Using this framework, we synthesize trends in datasets, benchmarks, and system architectures, highlighting the shift toward retrieval-grounded LLM pipelines, verification and deferral mechanisms, and emerging multimodal recitation and manuscript-processing systems.

We also consolidate evaluation practices for trustworthiness, including provenance and faithfulness, disagreement-aware and school-of-thought-sensitive framing, calibrated abstention under underspecified queries, and safety and bias assessment for Islamic contexts. Finally, we identify deployment-critical gaps and engineering priorities for building auditable, pluralism-aware, and risk-sensitive Islamic-knowledge AI.



Introduction

AI systems are increasingly mediating how Islamic communities access, study, and apply Islamic sources. With over 2 billion Muslims worldwide, the demand for reliable, culturally grounded AI tools for Islamic knowledge has never been greater. Yet research on Islamic-knowledge capabilities remains fragmented across NLP, information retrieval, speech processing, multimodal learning, educational technology, and LLM alignment.

This systematic survey reviews over 160 papers from the past decade (2016–2026), presenting a critical analysis of how AI systems operationalize Islamic knowledge. We follow the PRISMA-ScR framework to ensure transparency and reproducibility, screening 1,743 initial records down to 160 included studies.

Research landscape sunburst visualization
A sunburst chart depicting the hierarchical distribution of the surveyed literature across major domains and sub-applications

We introduce a layered taxonomy that separates an epistemic view of Islamic knowledge—authority-bearing foundations (Qur'an, Hadith) and established disciplines (Qur'anic Sciences, Hadith Sciences, Usul al-Fiqh, Fiqh, Theology, Sufism, and History)—from an instrumental AI task layer covering data and corpora, retrieval and grounding, understanding, reasoning support, evaluation and governance, and multimodal methods.

Three cross-cutting normative dimensions thread through the entire analysis: (i) doctrinal integrity and authenticity (correct attribution, protection against fabrication), (ii) normative correctness and disagreement handling (school-aware framing, calibrated abstention), and (iii) objectives, harms, and governance (maqasid-informed alignment, bias checking, deployment safety).

Our goal is to provide a comprehensive view of how current AI systems capture, structure, and evaluate Islamic knowledge—identifying where they resonate with scholarly practice and where they risk flattening diverse traditions and worldviews.

Comparison with Prior Surveys

Prior surveys on Islamic AI systems have focused on specific domains. This survey provides a comprehensive view across all aspects:

Prior Survey Year Primary Scope Main Research Areas Prov./Disagr./Abst. MM Norm./Plur. Systematic Method
Azmi (2019) 2019 Hadith-focused IR/ML/DL (pre-LLM) ~ ~ Narrative (no PRISMA)
Bashir et al. (2023) 2023 Qur'an-focused IR/ML/DL; limited Transformers; pre-LLM/RAG ~ Inclusion/exclusion + flow
Alnefaie et al. (2023) 2023 Islamic QA (Qur'an/Hadith/Fatwa) Retrieval + PLMs; task framing ~ ~ Survey + evaluation criteria
Ahmad et al. (2025) 2025 Qur'anic education AI-in-education (systematic review) ~ Systematic review
Hakim et al. (2023) 2023 Teaching Islamic studies AI-in-education (systematic review) ~ Review (PRISMA referenced)
Alhammad et al. (2025) 2025 Islamic education Pedagogy + learning outcomes (review) ~ Review
Mashaabi et al. (2024) 2024 Arabic LLMs (general) Transformers/LLMs; resources ~ Method + openness/resources
Rhel et al. (2025) 2025 Arabic LLMs (general) PLMs/LLMs; prompting; evaluation trends Review-style
Abouzied et al. (2025) 2025 Arabic LLMs landscape LLMs + benchmarks; harm/bias themes ~ ~ Landscape (no PRISMA)
Alzubaidi et al. (2025) 2025 Arabic LLM benchmarks LLM evaluation; benchmark taxonomy ~ ~ Systematic review
Asseri et al. (2025) 2025 Bias (Arabs/Muslims) Prompting/pipelines; bias measurement PRISMA
This Survey (2026) 2026 Whole Islamic stack IR, Transformers, LLM/RAG/agents; end-to-end evaluation PRISMA-ScR + reproducibility artifacts

Legend: ✓ = explicit focus; ~ = partial/implicit; ✗ = not covered. MM = multimodal; Norm./Plur. = normative/pluralism considerations; Prov./Disagr./Abst. = provenance/disagreement/abstention.



Scope & Methodology

To ensure a systematic, transparent, and reproducible review, we followed the PRISMA-ScR framework. This approach allowed us to rigorously map the rapidly evolving research landscape while minimizing selection bias.

PRISMA Flow Diagram
PRISMA flow diagram showing the systematic screening process from 1,743 records to 160 included papers

Research Questions

RQ1: Domains & Tasks

What Islamic knowledge domains and application tasks have been operationalized in ML/AI systems, and how is this work distributed across subfields?

RQ2: Resources & Measurement

What datasets, benchmarks, and knowledge resources are available, and what assumptions do they encode about evidence, provenance, and interpretive diversity?

RQ3: Evaluation & Trustworthiness

How do studies evaluate trustworthiness, especially source faithfulness, doctrinal correctness, pluralism-aware answering, and safety/bias?

Coverage: 2016-2026

Systematic search across Semantic Scholar, IEEE Xplore, ACM Digital Library, ACL Anthology, and arXiv. Initial corpus: 1,743 papers → 160 included after screening.



Multi-Dimensional Taxonomy

We developed a comprehensive two-layer taxonomy that bridges traditional Islamic sciences with modern AI evaluation. The epistemic layer organizes Islamic knowledge to prevent a common failure mode—treating all "Islamic text" as homogeneous content—by distinguishing sources that carry direct normative authority from disciplines that regulate how those sources are read, reconciled, and applied. The AI task layer is explicitly instrumental, enumerating reusable computational capabilities (retrieval, grounding, extraction, reasoning, evaluation) that apply across domains.

Epistemic Layer: Foundations

  • Qur'an: The revealed text and primary foundation of all Islamic disciplines. Motivates high-integrity representations, verse-boundary integrity, and script normalization.
  • Hadith: Textual corpora of the Prophet's sayings, actions, and tacit approvals. Motivates collection-aware retrieval, attribution, and separation of transmitted reports from commentary.

Epistemic Layer: Disciplines

  • Qur'anic Sciences: Preservation, recitation (qira'at), interpretation, tafsir alignment
  • Hadith Sciences: Isnad analysis, narrator evaluation, authenticity classification
  • Usul al-Fiqh: Legal theory, evidential hierarchy, structured inference
  • Fiqh: Practical rulings, comparative school retrieval, disagreement handling
  • Theology (Kalam): Doctrinal integrity, grounding claims in sources
  • Sufism (Tasawwuf): Genre-sensitive summarization, tone-aware guidance
  • History & Sirah: Event extraction, geo-temporal linking, provenance-aware summarization

Interactive Survey Organization & Taxonomy Framework

Hover over nodes to explore the hierarchical structure of Islamic AI research domains

Figure: Taxonomy of Islamic-knowledge AI work organized by content domains (sources and practice areas) and task families; Maqasid is represented as a cross-cutting objectives/safety/governance lens.
 Use Cases & Applications

AI Task Layer (Instrumental Methods)

Data & Corpora

Digitisation (OCR/HTR for manuscripts), corpus building, annotation schema design, versioning for provenance and reproducibility.

Retrieval & Grounding

Search/IR with cross-lingual support, RAG and evidence linking, provenance tracking, citation generation, quote-bounded answering.

Understanding

NER, relation/event extraction, geo-temporal linking, faithful summarisation, and multi-source synthesis that surfaces disagreements.

Reasoning Support

Comparative (madhhab-aware, ikhtilaf surfacing), uncertainty handling (abstention/deferral, clarification prompts).

Evaluation & Governance

Benchmarks, error taxonomies, reproducibility, hallucination control, fabrication detection, bias checks, and red-teaming assessment.

Multimodal

Speech/audio (ASR, tajwid, recitation coaching), document image processing (manuscript OCR/HTR, layout analysis, segmentation).

Islamic-knowledge AI is deployed in recurring settings that stress different reliability constraints:

Scripture-Grounded QA

Focus: Qur'an/Hadith search, reference assistants

Key risks: Provenance breakage, source fabrication, citation hallucination

Requirements: Source faithfulness, verifiable attribution

Fiqh & Fatwa Systems

Focus: Jurisprudential reasoning, school-aware rulings

Key risks: Collapsing legitimate disagreement (ikhtilāf), underspecified answers

Requirements: Multi-school awareness, qualified responses, abstention policies

Practice Support

Focus: Hajj/Umrah guidance, prayer times, rituals

Key risks: Operational errors in high-stakes contexts

Requirements: Conservative abstention, traceable sourcing, clear disclaimers

Multimodal Learning

Focus: Recitation coaching, tajwīd feedback, OCR

Key risks: Pronunciation errors, incorrect tajwīd rules

Requirements: Robust speech pipelines, expert validation


These use cases highlight the diversity of deployment contexts and the varying trustworthiness requirements across Islamic-knowledge AI applications.



Resources & Benchmarks

Islamic alignment requires structured corpora spanning primary texts, classical scholarship, contemporary legal resolutions, linguistic resources, and benchmark datasets. We organize these into five major categories:

Primary Sources

Qur'an: Tanzil, Quranic Arabic Corpus

Hadith: Kutub al-Sittah (Six Books), authenticated collections

Key features: Canonical texts, linguistic annotation, chain of narration

Classical Heritage

Content: Uṣūl al-Fiqh, legal maxims, Maqāṣid texts

Examples: al-Ghazālī's works, al-Shāṭibī's al-Muwāfaqāt

Resources: OpenITI, digitized libraries

Contemporary Fatwas

Sources: IIFA, ECFR, MWL Fiqh Council, AMJA

Topics: Finance, bioethics, minority jurisprudence

Purpose: Connect classical principles to modern issues

Linguistic Resources

Tools: Arabic morphological analyzers, embeddings

Data: Quranic Arabic Corpus, narrator databases

Models: AraBERT, CAMeLBERT, specialized embeddings

Benchmarks & Shared Tasks

Datasets: QIAS, IslamicEval, PalmX, FiqhQA

Multimodal: Iqra'Eval (recitation), AraHalluEval

Focus: Correctness, faithfulness, safety


Data Challenges

Key issues in Islamic knowledge data resources:

  • Scarcity: Limited high-quality annotated datasets
  • Quality control: Web scrapes may mix authoritative sources with sectarian content
  • Multi-school coverage: Need for balanced representation across Sunni, Shia, and other interpretations
  • Language diversity: Most resources are Arabic-centric; need for multilingual coverage

Key Datasets & Benchmarks

Overview of resources related to Islamic AI, categorized by Classical Pre-training Corpora, Knowledge Bases (KB) for Retrieval, and Benchmarks

Dataset Description Type Size License Lang Ref.
Classical Pre-training Corpora (Turath)
OpenITI The largest machine-readable corpus of pre-modern Islamicate texts. Corpus ~1B tokens CC-BY-4.0 Ar/Per [1]
Shamela (Cleaned) Text version of Shamela library; covers Fiqh, Tafsir, and History. Corpus ~1B Tokens Public Ar [2]
KSUCCA King Saud Univ. Corpus of Classical Arabic (7th–11th Century CE). Corpus 50M Tokens Research Ar (Cls) [3]
Tashkeela Fully vocalized classical texts for training diacritic-aware models. Corpus 75M Tokens Public Ar [4]
Noor Corpus Massive diverse library of Islamic PDFs and OCR'd texts. Corpus >100k Books Mixed Ar [5]
Knowledge Bases & Ontologies
QuranMorph Corpus Morphologically annotated Quranic corpus with POS, Lemmatization, etc. KB 77k Tokens CC-BY-4.0 Ar/En [6]
Quranic Arabic Corpus Morphological and syntactic ontology mapping concepts in the Quran. KB/Onto 77k nodes Research Ar/En [7]
IslamicPCQA KB Curated knowledge base of 1M+ Islamic documents for retrieval. KB 1M+ Docs Research Fa/Ar [8]
Sunnah.com API Structured Hadith collections with grading (Sahih/Hasan) metadata. API/KB 6 Major Books Open Ar/En [9]
Quranic & Hadith Resources
IslamicEval 2025 Hallucination detection-Quran/Hadith. Hallu 1,506 Questions Apache 2.0 Ar [10]
Iqra'Eval 2025 Quranic mispronunciation diagnosis. Audio 82+ Hours Research Ar [11]
Sanadset 650K Hadith dataset with narrator-chain (Sanad). Corpus 650,986 Records CC-BY-4.0 Ar [12]
QURAN-MD Unified verse-level text, translation, transliteration, and 32 reciters. Multi 6,236 Verses Research Ar/En [13]
Jurisprudence (Fiqh) & Reasoning
QIAS 2025 Inheritance & general knowledge. Reason 22,000 MCQs Research Ar [14]
FiqhQA QA by 4 Sunni schools; abstention eval. QA 960 QAs Research Ar/En [15]
IslamicPCQA QA with 1M+ doc knowledge base. QA/RAG 12k Pairs Research Fa [16]
Hajj-FQA Specialized QA for Hajj rituals/fatwas. QA 886 Hajj-fatwas Research Ar [17]
Cultural & Ethical Alignment
PalmX 2025 General Arabic & Islamic Culture. Culture 6.4k MCQs Shared Task Ar [18]
BengaliMoralBench Moral reasoning-Bengali Islamic culture. Ethics 3k Scenarios CC-BY-NC-ND Bn [19]
IslamTrust Alignment benchmark with consensus-based Islamic ethical principles. Ethics Multi Research Ar/En [20]
ADAB App reviews corpus annotated for politeness and religious etiquette. Style Curated Research Ar [21]

Note: QA=Question Answering, Reason=Reasoning/Math, Hallu=Hallucination Detection, Audio=Speech/Recitation, KB=Knowledge Base, Multi=Multimodal



Evaluation & Trustworthiness

Evaluation of Islamic-domain LLMs cannot be reduced to generic NLP benchmarks. The core question is not fluency but whether a system is safe and warranted: does it ground claims in authoritative evidence, preserve provenance, and refrain when evidence is missing or contested?

Evaluation Methodologies

Comprehensive evaluation framework for Islamic AI systems across four primary domains

Methodology Metrics & Description Key Benchmarks
Linguistic & Reasoning Capabilities
N-gram Matching BLEU, ROUGE, METEOR: Measures lexical overlap with reference text. Limited utility for theological nuance. OALL [22], AraGen [23]
Symbolic Verification Execution Accuracy: Validates mathematical derivations (e.g., inheritance shares) against rule-based engines. QIAS [14], GATMath [24]
Standardized Testing Normalized Accuracy: Performance on multiple-choice questions (MCQs) across diverse subjects (STEM, Humanities). ArabicMMLU [25], QuranBench [26]
Retrieval & Grounding (RAG)
Factuality Checking Span-Level Error Rate: Percentage of generated text spans containing fabricated content. Halwasa [27], HalluVerse [28]
Citation Verification Citation Precision/Recall: Accuracy of retrieving specific Qur'anic Ayahs or Hadith to support a claim. IslamicEval [10], Hajj-FQA [17]
Entailment Faithfulness Score: Measures if the generated answer is logically entailed by the retrieved context. FARSIQA [16], AraHalluEval [29]
Doctrinal & Cultural Alignment
Scholar-in-the-Loop Adjudication Score: Human expert evaluation of Fatwa correctness, Adab (etiquette), and Hikmah (wisdom). FiqhQA [15], Iqra'Eval [11], ADAB [21]
Cultural Probing Alignment Score: Degree of conformity to Arab-Islamic norms vs. Western-centric values. IslamTrust [20], PalmX [18]
Dialectal Robustness Dialectal Accuracy: Performance consistency across Modern Standard Arabic (MSA) and regional dialects. AraDiCE [30], Absher [31]
Safety & Governance
Red Teaming Attack Success Rate (ASR): Vulnerability to prompt injection or generating prohibited content (e.g., hate speech). ASAS [32], AraTrust [33]
Mechanistic Analysis Latent Activation: Identification of internal neuronal pathways associated with bias or violence. Simbeck et al. [34]

Note: Evaluation methodologies categorized into four primary domains: general capabilities, retrieval & grounding, doctrinal faithfulness, and safety & ethics. Each methodology includes specific metrics and corresponding benchmarks for comprehensive assessment.

Key Insight

Trustworthy Islamic-knowledge AI requires multi-dimensional evaluation that goes beyond accuracy metrics to include provenance verification, pluralism awareness, cultural sensitivity, and governance-oriented assessment. Standard NLP benchmarks are insufficient for high-stakes religious guidance applications.



Evolution of Methods

Methods for operationalizing Islamic knowledge in AI have evolved through three overlapping eras. The LLM era does not replace earlier techniques so much as subsume them—lexical retrieval and symbolic resources reappear inside RAG and verification pipelines.

Phase 1: ~2000–2017

Pre-Transformer & Early Neural

  • Preprocessing: Rule-based stemmers (Khoja, ISRI)
  • Representation: Bag-of-words, TF-IDF, static embeddings (AraVec)
  • Models: BM25 retrieval, SVM/CRF classifiers, KG for Qur'an/Hadith
  • Limitations: Limited context for theological nuance, no long-range reasoning

Phase 2: ~2018–2022

Neural Encoders & Arabic LMs

  • Preprocessing: Diacritic restoration, Ayah/Hadith segmentation
  • Representation: Contextual encoders (AraBERT, MARBERT, CAMeLBERT)
  • Models: Neural retrieval, LSTM/BiLSTM, QA over Qur'an/Hadith
  • Advance: Improved semantic search and task-specific classification

Phase 3: ~2023–Present

Generative LLMs, RAG & Agentic Pipelines

  • Native Arabic LLMs: Jais, ArabianGPT, ALLaM
  • Adapted models: AceGPT, Fanar, Yehia (CPT/SFT)
  • Pipelines: Multi-step agents for inheritance, zakat, Hajj workflows
  • Safeguards: Chain-of-thought citation answers, Islamic safety policies, bias checks, benchmark-driven evaluation

Key Architectural Trends

  • Retrieval-Augmented Generation (RAG): Grounding LLM outputs in Qur'anic verses, Hadith, and Fiqh texts to reduce hallucination and enable verifiable attribution.
  • Agentic Pipelines: Multi-step workflows where specialized sub-agents handle retrieval, reasoning, and verification separately (e.g., inheritance calculation, Hajj guidance).
  • Alignment & Safety Layers: Islamic-specific safety policies, deferral mechanisms for underspecified queries, and red-teaming protocols for religious content.
  • Multimodal Expansion: Quranic recitation coaching (mispronunciation detection, tajwid), manuscript OCR/HTR, and Arabic sign language recognition.


Current Challenges & Future Directions

Despite significant progress, Islamic-knowledge AI faces critical challenges that require interdisciplinary solutions. Our survey identifies six deployment-critical gaps and engineering priorities:

Data Scarcity & Quality

  • Limited annotated datasets for specialized tasks
  • Web-scraped data mixes authoritative and unreliable sources
  • Imbalanced coverage across languages and schools of thought
  • Most resources Arabic-centric; low-resource Muslim-majority languages underserved
  • Direction: Curated, multi-school, multilingual corpora with provenance metadata

Reasoning Complexity

  • Jurisprudential reasoning requires multi-step inference with evidential hierarchy
  • Analogy (qiyas), contextual interpretation, and conditions of application
  • Inheritance-law calculation demands mathematical logic + legal reasoning
  • Direction: Structured reasoning, chain-of-thought prompting, symbolic verification engines

Disagreement & Pluralism

  • Systems flatten legitimate scholarly disagreement (ikhtilaf) into single answers
  • Madhahib-aware presentation needed to avoid "one true answer" bias
  • Context-sensitive questioning when inputs are underspecified
  • Direction: Disagrement-aware scoring, school-of-thought-sensitive framing, safe deferral

Cultural Flattening & Bias

  • Western-centric RLHF policies misrepresent Islamic values
  • Models trained on web text encode stereotypes about Muslims
  • Loss of local, contextual, and dialectal interpretations
  • Direction: Culturally grounded alignment, regional models, community-participatory design

Hallucination & Fabrication

  • LLMs fabricate Qur'anic verses and Hadith with confident presentation
  • Incorrect attribution of rulings to scholars or schools
  • Fabricated citations are uniquely harmful in religious contexts
  • Direction: Retrieval-grounded generation, span-level fabrication detection, provenance-preserving pipelines

Safety & Governance

  • High-stakes religious guidance requires conservative abstention strategies
  • Need for scholar-in-the-loop validation and governance frameworks
  • Red-teaming protocols tailored to religious harms (misattribution, offensive stereotyping)
  • Direction: Calibrated confidence, auditable systems, deferral-to-authority mechanisms

Engineering Priorities for the Path Forward

Building auditable, pluralism-aware, and risk-sensitive Islamic-knowledge AI requires:

  • Provenance-preserving grounding: Retrieval-grounded pipelines with verifiable citations, not free-form synthesis
  • Disagreement-aware systems: Present alternative views with supporting evidence rather than collapsing into single answers
  • Calibrated abstention: Systems that defer, request missing context, or direct users to qualified authority when grounding is unreliable
  • Interdisciplinary collaboration: AI researchers, Islamic scholars ('ulama), and community stakeholders working together
  • Benchmark investment: Evaluation protocols that penalize fabricated citations more than generic factual errors, with disagreement-aware scoring
  • Safety-first deployment: Islamic-specific red-teaming, bias checks, and governance frameworks with clear disclaimers about AI limitations


Authors

Gagan Bhatia
QCRI, HBKU Qatar
Hamdy Mubarak
QCRI, HBKU Qatar
Majd Hawasly
QCRI, HBKU Qatar
Mustafa Jarrar
CHSS, HBKU Qatar
George Mikros
CHSS, HBKU Qatar
Fadi Zaraket
ACRPS Qatar
Mahmoud Alhirthani
CHSS, HBKU Qatar
Mutaz Al-Khatib
CIS, HBKU Qatar
Logan Cochrane
CPP, HBKU Qatar
Kareem Darwish
QCRI, HBKU Qatar
Rashid Yahiaoui
CHSS, HBKU Qatar
Firoj Alam
QCRI, HBKU Qatar

Contact: fialam@hbku.edu.qa



Citation

@article{bhatia2026islamic,
  title={Advances in AI Systems on Islamic Knowledge Capabilities: A Critical Survey},
  author={Bhatia, Gagan and Mubarak, Hamdy and Hawasly, Majd and Jarrar, Mustafa and 
          Mikros, George and Zaraket, Fadi and Alhirthani, Mahmoud and Al-Khatib, Mutaz and 
          Cochrane, Logan and Darwish, Kareem and Yahiaoui, Rashid and Alam, Firoj},
  year={2026},
  institution={Qatar Computing Research Institute, HBKU}
}