Advances in AI Systems on Islamic Knowledge Capabilities: A Critical Survey

A systematic scoping review examining how AI systems operationalize Islamic knowledge across 160+ papers (2016-2026)

arXiv GitHub Paper (PDF) Browse Papers

Publication trends showing research growth

Abstract

AI systems are increasingly mediating how Islamic communities access, study, and apply Islamic sources; still, research on Islamic-knowledge capabilities remains fragmented across NLP, information retrieval, speech, multimodal learning, educational technology, and recent LLM alignment work.

This survey presents a critical systematic review of 160+ papers from the past decade that incorporate Islamic knowledge in Machine Learning/AI. We propose a layered taxonomy that separates an epistemic view of Islamic knowledge (authority-bearing foundations and established disciplines) from an instrumental AI task layer (data and corpora, retrieval and grounding, understanding, reasoning support, evaluation and governance, and multimodal methods), while treating normative concerns as cross-cutting constraints.

Using this framework, we synthesize trends in datasets, benchmarks, and system architectures, highlighting the shift toward retrieval-grounded LLM pipelines, verification and deferral mechanisms, and emerging multimodal recitation and manuscript-processing systems.

We also consolidate evaluation practices for trustworthiness, including provenance and faithfulness, disagreement-aware and school-of-thought-sensitive framing, calibrated abstention under underspecified queries, and safety and bias assessment for Islamic contexts. Finally, we identify deployment-critical gaps and engineering priorities for building auditable, pluralism-aware, and risk-sensitive Islamic-knowledge AI.

Introduction

AI systems are increasingly mediating how Islamic communities access, study, and apply Islamic sources. With over 2 billion Muslims worldwide, the demand for reliable, culturally grounded AI tools for Islamic knowledge has never been greater. Yet research on Islamic-knowledge capabilities remains fragmented across NLP, information retrieval, speech processing, multimodal learning, educational technology, and LLM alignment.

This systematic survey reviews over 160 papers from the past decade (2016–2026), presenting a critical analysis of how AI systems operationalize Islamic knowledge. We follow the PRISMA-ScR framework to ensure transparency and reproducibility, screening 1,743 initial records down to 160 included studies.

Research landscape sunburst visualization — A sunburst chart depicting the hierarchical distribution of the surveyed literature across major domains and sub-applications

We introduce a layered taxonomy that separates an epistemic view of Islamic knowledge—authority-bearing foundations (Qur'an, Hadith) and established disciplines (Qur'anic Sciences, Hadith Sciences, Usul al-Fiqh, Fiqh, Theology, Sufism, and History)—from an instrumental AI task layer covering data and corpora, retrieval and grounding, understanding, reasoning support, evaluation and governance, and multimodal methods.

Three cross-cutting normative dimensions thread through the entire analysis: (i) doctrinal integrity and authenticity (correct attribution, protection against fabrication), (ii) normative correctness and disagreement handling (school-aware framing, calibrated abstention), and (iii) objectives, harms, and governance (maqasid-informed alignment, bias checking, deployment safety).

Our goal is to provide a comprehensive view of how current AI systems capture, structure, and evaluate Islamic knowledge—identifying where they resonate with scholarly practice and where they risk flattening diverse traditions and worldviews.

Comparison with Prior Surveys

Prior surveys on Islamic AI systems have focused on specific domains. This survey provides a comprehensive view across all aspects:

Prior Survey	Year	Primary Scope	Main Research Areas	Prov./Disagr./Abst.	MM	Norm./Plur.	Systematic Method
Azmi (2019)	2019	Hadith-focused	IR/ML/DL (pre-LLM)	~	✗	~	Narrative (no PRISMA)
Bashir et al. (2023)	2023	Qur'an-focused	IR/ML/DL; limited Transformers; pre-LLM/RAG	~	✓	✗	Inclusion/exclusion + flow
Alnefaie et al. (2023)	2023	Islamic QA (Qur'an/Hadith/Fatwa)	Retrieval + PLMs; task framing	~	✗	~	Survey + evaluation criteria
Ahmad et al. (2025)	2025	Qur'anic education	AI-in-education (systematic review)	✗	✗	~	Systematic review
Hakim et al. (2023)	2023	Teaching Islamic studies	AI-in-education (systematic review)	✗	✗	~	Review (PRISMA referenced)
Alhammad et al. (2025)	2025	Islamic education	Pedagogy + learning outcomes (review)	✗	✗	~	Review
Mashaabi et al. (2024)	2024	Arabic LLMs (general)	Transformers/LLMs; resources	✗	✗	~	Method + openness/resources
Rhel et al. (2025)	2025	Arabic LLMs (general)	PLMs/LLMs; prompting; evaluation trends	✗	✗	✗	Review-style
Abouzied et al. (2025)	2025	Arabic LLMs landscape	LLMs + benchmarks; harm/bias themes	~	✗	~	Landscape (no PRISMA)
Alzubaidi et al. (2025)	2025	Arabic LLM benchmarks	LLM evaluation; benchmark taxonomy	~	✗	~	Systematic review
Asseri et al. (2025)	2025	Bias (Arabs/Muslims)	Prompting/pipelines; bias measurement	✗	✗	✓	PRISMA
This Survey (2026)	2026	Whole Islamic stack	IR, Transformers, LLM/RAG/agents; end-to-end evaluation	✓	✓	✓	PRISMA-ScR + reproducibility artifacts

Legend: ✓ = explicit focus; ~ = partial/implicit; ✗ = not covered. MM = multimodal; Norm./Plur. = normative/pluralism considerations; Prov./Disagr./Abst. = provenance/disagreement/abstention.

Scope & Methodology

To ensure a systematic, transparent, and reproducible review, we followed the PRISMA-ScR framework. This approach allowed us to rigorously map the rapidly evolving research landscape while minimizing selection bias.

PRISMA Flow Diagram — PRISMA flow diagram showing the systematic screening process from 1,743 records to 160 included papers

Research Questions

RQ1: Domains & Tasks

What Islamic knowledge domains and application tasks have been operationalized in ML/AI systems, and how is this work distributed across subfields?

RQ2: Resources & Measurement

What datasets, benchmarks, and knowledge resources are available, and what assumptions do they encode about evidence, provenance, and interpretive diversity?

RQ3: Evaluation & Trustworthiness

How do studies evaluate trustworthiness, especially source faithfulness, doctrinal correctness, pluralism-aware answering, and safety/bias?

Coverage: 2016-2026

Systematic search across Semantic Scholar, IEEE Xplore, ACM Digital Library, ACL Anthology, and arXiv. Initial corpus: 1,743 papers → 160 included after screening.

Multi-Dimensional Taxonomy

We developed a comprehensive two-layer taxonomy that bridges traditional Islamic sciences with modern AI evaluation. The epistemic layer organizes Islamic knowledge to prevent a common failure mode—treating all "Islamic text" as homogeneous content—by distinguishing sources that carry direct normative authority from disciplines that regulate how those sources are read, reconciled, and applied. The AI task layer is explicitly instrumental, enumerating reusable computational capabilities (retrieval, grounding, extraction, reasoning, evaluation) that apply across domains.

Epistemic Layer: Foundations

Qur'an: The revealed text and primary foundation of all Islamic disciplines. Motivates high-integrity representations, verse-boundary integrity, and script normalization.
Hadith: Textual corpora of the Prophet's sayings, actions, and tacit approvals. Motivates collection-aware retrieval, attribution, and separation of transmitted reports from commentary.

Epistemic Layer: Disciplines

Qur'anic Sciences: Preservation, recitation (qira'at), interpretation, tafsir alignment
Hadith Sciences: Isnad analysis, narrator evaluation, authenticity classification
Usul al-Fiqh: Legal theory, evidential hierarchy, structured inference
Fiqh: Practical rulings, comparative school retrieval, disagreement handling
Theology (Kalam): Doctrinal integrity, grounding claims in sources
Sufism (Tasawwuf): Genre-sensitive summarization, tone-aware guidance
History & Sirah: Event extraction, geo-temporal linking, provenance-aware summarization

Interactive Survey Organization & Taxonomy Framework

Hover over nodes to explore the hierarchical structure of Islamic AI research domains

Figure: Taxonomy of Islamic-knowledge AI work organized by content domains (sources and practice areas) and task families; Maqasid is represented as a cross-cutting objectives/safety/governance lens.

Use Cases & Applications

AI Task Layer (Instrumental Methods)

Data & Corpora

Digitisation (OCR/HTR for manuscripts), corpus building, annotation schema design, versioning for provenance and reproducibility.

Retrieval & Grounding

Search/IR with cross-lingual support, RAG and evidence linking, provenance tracking, citation generation, quote-bounded answering.

Understanding

NER, relation/event extraction, geo-temporal linking, faithful summarisation, and multi-source synthesis that surfaces disagreements.

Reasoning Support

Comparative (madhhab-aware, ikhtilaf surfacing), uncertainty handling (abstention/deferral, clarification prompts).

Evaluation & Governance

Benchmarks, error taxonomies, reproducibility, hallucination control, fabrication detection, bias checks, and red-teaming assessment.

Multimodal

Speech/audio (ASR, tajwid, recitation coaching), document image processing (manuscript OCR/HTR, layout analysis, segmentation).

Islamic-knowledge AI is deployed in recurring settings that stress different reliability constraints:

Scripture-Grounded QA

Focus: Qur'an/Hadith search, reference assistants

Key risks: Provenance breakage, source fabrication, citation hallucination

Requirements: Source faithfulness, verifiable attribution

Fiqh & Fatwa Systems

Focus: Jurisprudential reasoning, school-aware rulings

Key risks: Collapsing legitimate disagreement (ikhtilāf), underspecified answers

Requirements: Multi-school awareness, qualified responses, abstention policies

Practice Support

Focus: Hajj/Umrah guidance, prayer times, rituals

Key risks: Operational errors in high-stakes contexts

Requirements: Conservative abstention, traceable sourcing, clear disclaimers

Multimodal Learning

Focus: Recitation coaching, tajwīd feedback, OCR

Key risks: Pronunciation errors, incorrect tajwīd rules

Requirements: Robust speech pipelines, expert validation

These use cases highlight the diversity of deployment contexts and the varying trustworthiness requirements across Islamic-knowledge AI applications.

Resources & Benchmarks

Islamic alignment requires structured corpora spanning primary texts, classical scholarship, contemporary legal resolutions, linguistic resources, and benchmark datasets. We organize these into five major categories:

Primary Sources

Qur'an: Tanzil, Quranic Arabic Corpus

Hadith: Kutub al-Sittah (Six Books), authenticated collections

Key features: Canonical texts, linguistic annotation, chain of narration

Classical Heritage

Content: Uṣūl al-Fiqh, legal maxims, Maqāṣid texts

Examples: al-Ghazālī's works, al-Shāṭibī's al-Muwāfaqāt

Resources: OpenITI, digitized libraries

Contemporary Fatwas

Sources: IIFA, ECFR, MWL Fiqh Council, AMJA

Topics: Finance, bioethics, minority jurisprudence

Purpose: Connect classical principles to modern issues

Linguistic Resources

Tools: Arabic morphological analyzers, embeddings

Data: Quranic Arabic Corpus, narrator databases

Models: AraBERT, CAMeLBERT, specialized embeddings

Benchmarks & Shared Tasks

Datasets: QIAS, IslamicEval, PalmX, FiqhQA

Multimodal: Iqra'Eval (recitation), AraHalluEval

Focus: Correctness, faithfulness, safety

Data Challenges

Key issues in Islamic knowledge data resources:

Scarcity: Limited high-quality annotated datasets
Quality control: Web scrapes may mix authoritative sources with sectarian content
Multi-school coverage: Need for balanced representation across Sunni, Shia, and other interpretations
Language diversity: Most resources are Arabic-centric; need for multilingual coverage

Key Datasets & Benchmarks

Overview of resources related to Islamic AI, categorized by Classical Pre-training Corpora, Knowledge Bases (KB) for Retrieval, and Benchmarks

Dataset	Description	Type	Size	License	Lang	Ref.
Classical Pre-training Corpora (Turath)
OpenITI	The largest machine-readable corpus of pre-modern Islamicate texts.	Corpus	~1B tokens	CC-BY-4.0	Ar/Per	[1]
Shamela (Cleaned)	Text version of Shamela library; covers Fiqh, Tafsir, and History.	Corpus	~1B Tokens	Public	Ar	[2]
KSUCCA	King Saud Univ. Corpus of Classical Arabic (7th–11th Century CE).	Corpus	50M Tokens	Research	Ar (Cls)	[3]
Tashkeela	Fully vocalized classical texts for training diacritic-aware models.	Corpus	75M Tokens	Public	Ar	[4]
Noor Corpus	Massive diverse library of Islamic PDFs and OCR'd texts.	Corpus	>100k Books	Mixed	Ar	[5]
Knowledge Bases & Ontologies
QuranMorph Corpus	Morphologically annotated Quranic corpus with POS, Lemmatization, etc.	KB	77k Tokens	CC-BY-4.0	Ar/En	[6]
Quranic Arabic Corpus	Morphological and syntactic ontology mapping concepts in the Quran.	KB/Onto	77k nodes	Research	Ar/En	[7]
IslamicPCQA KB	Curated knowledge base of 1M+ Islamic documents for retrieval.	KB	1M+ Docs	Research	Fa/Ar	[8]
Sunnah.com API	Structured Hadith collections with grading (Sahih/Hasan) metadata.	API/KB	6 Major Books	Open	Ar/En	[9]
Quranic & Hadith Resources
IslamicEval 2025	Hallucination detection-Quran/Hadith.	Hallu	1,506 Questions	Apache 2.0	Ar	[10]
Iqra'Eval 2025	Quranic mispronunciation diagnosis.	Audio	82+ Hours	Research	Ar	[11]
Sanadset 650K	Hadith dataset with narrator-chain (Sanad).	Corpus	650,986 Records	CC-BY-4.0	Ar	[12]
QURAN-MD	Unified verse-level text, translation, transliteration, and 32 reciters.	Multi	6,236 Verses	Research	Ar/En	[13]
Jurisprudence (Fiqh) & Reasoning
QIAS 2025	Inheritance & general knowledge.	Reason	22,000 MCQs	Research	Ar	[14]
FiqhQA	QA by 4 Sunni schools; abstention eval.	QA	960 QAs	Research	Ar/En	[15]
IslamicPCQA	QA with 1M+ doc knowledge base.	QA/RAG	12k Pairs	Research	Fa	[16]
Hajj-FQA	Specialized QA for Hajj rituals/fatwas.	QA	886 Hajj-fatwas	Research	Ar	[17]
Cultural & Ethical Alignment
PalmX 2025	General Arabic & Islamic Culture.	Culture	6.4k MCQs	Shared Task	Ar	[18]
BengaliMoralBench	Moral reasoning-Bengali Islamic culture.	Ethics	3k Scenarios	CC-BY-NC-ND	Bn	[19]
IslamTrust	Alignment benchmark with consensus-based Islamic ethical principles.	Ethics	Multi	Research	Ar/En	[20]
ADAB	App reviews corpus annotated for politeness and religious etiquette.	Style	Curated	Research	Ar	[21]

Note: QA=Question Answering, Reason=Reasoning/Math, Hallu=Hallucination Detection, Audio=Speech/Recitation, KB=Knowledge Base, Multi=Multimodal

Evaluation & Trustworthiness

Evaluation of Islamic-domain LLMs cannot be reduced to generic NLP benchmarks. The core question is not fluency but whether a system is safe and warranted: does it ground claims in authoritative evidence, preserve provenance, and refrain when evidence is missing or contested?

Evaluation Methodologies

Comprehensive evaluation framework for Islamic AI systems across four primary domains

Methodology	Metrics & Description	Key Benchmarks
Linguistic & Reasoning Capabilities
N-gram Matching	BLEU, ROUGE, METEOR: Measures lexical overlap with reference text. Limited utility for theological nuance.	OALL [22], AraGen [23]
Symbolic Verification	Execution Accuracy: Validates mathematical derivations (e.g., inheritance shares) against rule-based engines.	QIAS [14], GATMath [24]
Standardized Testing	Normalized Accuracy: Performance on multiple-choice questions (MCQs) across diverse subjects (STEM, Humanities).	ArabicMMLU [25], QuranBench [26]
Retrieval & Grounding (RAG)
Factuality Checking	Span-Level Error Rate: Percentage of generated text spans containing fabricated content.	Halwasa [27], HalluVerse [28]
Citation Verification	Citation Precision/Recall: Accuracy of retrieving specific Qur'anic Ayahs or Hadith to support a claim.	IslamicEval [10], Hajj-FQA [17]
Entailment	Faithfulness Score: Measures if the generated answer is logically entailed by the retrieved context.	FARSIQA [16], AraHalluEval [29]
Doctrinal & Cultural Alignment
Scholar-in-the-Loop	Adjudication Score: Human expert evaluation of Fatwa correctness, Adab (etiquette), and Hikmah (wisdom).	FiqhQA [15], Iqra'Eval [11], ADAB [21]
Cultural Probing	Alignment Score: Degree of conformity to Arab-Islamic norms vs. Western-centric values.	IslamTrust [20], PalmX [18]
Dialectal Robustness	Dialectal Accuracy: Performance consistency across Modern Standard Arabic (MSA) and regional dialects.	AraDiCE [30], Absher [31]
Safety & Governance
Red Teaming	Attack Success Rate (ASR): Vulnerability to prompt injection or generating prohibited content (e.g., hate speech).	ASAS [32], AraTrust [33]
Mechanistic Analysis	Latent Activation: Identification of internal neuronal pathways associated with bias or violence.	Simbeck et al. [34]

Note: Evaluation methodologies categorized into four primary domains: general capabilities, retrieval & grounding, doctrinal faithfulness, and safety & ethics. Each methodology includes specific metrics and corresponding benchmarks for comprehensive assessment.

Key Insight

Trustworthy Islamic-knowledge AI requires multi-dimensional evaluation that goes beyond accuracy metrics to include provenance verification, pluralism awareness, cultural sensitivity, and governance-oriented assessment. Standard NLP benchmarks are insufficient for high-stakes religious guidance applications.

Evolution of Methods

Methods for operationalizing Islamic knowledge in AI have evolved through three overlapping eras. The LLM era does not replace earlier techniques so much as subsume them—lexical retrieval and symbolic resources reappear inside RAG and verification pipelines.

Phase 1: ~2000–2017

Pre-Transformer & Early Neural

Preprocessing: Rule-based stemmers (Khoja, ISRI)
Representation: Bag-of-words, TF-IDF, static embeddings (AraVec)
Models: BM25 retrieval, SVM/CRF classifiers, KG for Qur'an/Hadith
Limitations: Limited context for theological nuance, no long-range reasoning

Phase 2: ~2018–2022

Neural Encoders & Arabic LMs

Preprocessing: Diacritic restoration, Ayah/Hadith segmentation
Representation: Contextual encoders (AraBERT, MARBERT, CAMeLBERT)
Models: Neural retrieval, LSTM/BiLSTM, QA over Qur'an/Hadith
Advance: Improved semantic search and task-specific classification

Phase 3: ~2023–Present

Generative LLMs, RAG & Agentic Pipelines

Native Arabic LLMs: Jais, ArabianGPT, ALLaM
Adapted models: AceGPT, Fanar, Yehia (CPT/SFT)
Pipelines: Multi-step agents for inheritance, zakat, Hajj workflows
Safeguards: Chain-of-thought citation answers, Islamic safety policies, bias checks, benchmark-driven evaluation

Key Architectural Trends

Retrieval-Augmented Generation (RAG): Grounding LLM outputs in Qur'anic verses, Hadith, and Fiqh texts to reduce hallucination and enable verifiable attribution.
Agentic Pipelines: Multi-step workflows where specialized sub-agents handle retrieval, reasoning, and verification separately (e.g., inheritance calculation, Hajj guidance).
Alignment & Safety Layers: Islamic-specific safety policies, deferral mechanisms for underspecified queries, and red-teaming protocols for religious content.
Multimodal Expansion: Quranic recitation coaching (mispronunciation detection, tajwid), manuscript OCR/HTR, and Arabic sign language recognition.

Current Challenges & Future Directions

Despite significant progress, Islamic-knowledge AI faces critical challenges that require interdisciplinary solutions. Our survey identifies six deployment-critical gaps and engineering priorities:

Data Scarcity & Quality

Limited annotated datasets for specialized tasks
Web-scraped data mixes authoritative and unreliable sources
Imbalanced coverage across languages and schools of thought
Most resources Arabic-centric; low-resource Muslim-majority languages underserved
Direction: Curated, multi-school, multilingual corpora with provenance metadata

Reasoning Complexity

Jurisprudential reasoning requires multi-step inference with evidential hierarchy
Analogy (qiyas), contextual interpretation, and conditions of application
Inheritance-law calculation demands mathematical logic + legal reasoning
Direction: Structured reasoning, chain-of-thought prompting, symbolic verification engines

Disagreement & Pluralism

Systems flatten legitimate scholarly disagreement (ikhtilaf) into single answers
Madhahib-aware presentation needed to avoid "one true answer" bias
Context-sensitive questioning when inputs are underspecified
Direction: Disagrement-aware scoring, school-of-thought-sensitive framing, safe deferral

Cultural Flattening & Bias

Western-centric RLHF policies misrepresent Islamic values
Models trained on web text encode stereotypes about Muslims
Loss of local, contextual, and dialectal interpretations
Direction: Culturally grounded alignment, regional models, community-participatory design

Hallucination & Fabrication

LLMs fabricate Qur'anic verses and Hadith with confident presentation
Incorrect attribution of rulings to scholars or schools
Fabricated citations are uniquely harmful in religious contexts
Direction: Retrieval-grounded generation, span-level fabrication detection, provenance-preserving pipelines

Safety & Governance

High-stakes religious guidance requires conservative abstention strategies
Need for scholar-in-the-loop validation and governance frameworks
Red-teaming protocols tailored to religious harms (misattribution, offensive stereotyping)
Direction: Calibrated confidence, auditable systems, deferral-to-authority mechanisms

Engineering Priorities for the Path Forward

Building auditable, pluralism-aware, and risk-sensitive Islamic-knowledge AI requires:

Provenance-preserving grounding: Retrieval-grounded pipelines with verifiable citations, not free-form synthesis
Disagreement-aware systems: Present alternative views with supporting evidence rather than collapsing into single answers
Calibrated abstention: Systems that defer, request missing context, or direct users to qualified authority when grounding is unreliable
Interdisciplinary collaboration: AI researchers, Islamic scholars ('ulama), and community stakeholders working together
Benchmark investment: Evaluation protocols that penalize fabricated citations more than generic factual errors, with disagreement-aware scoring
Safety-first deployment: Islamic-specific red-teaming, bias checks, and governance frameworks with clear disclaimers about AI limitations

Authors

Gagan Bhatia
QCRI, HBKU Qatar

Hamdy Mubarak
QCRI, HBKU Qatar

Majd Hawasly
QCRI, HBKU Qatar

Mustafa Jarrar
CHSS, HBKU Qatar

George Mikros
CHSS, HBKU Qatar

Fadi Zaraket
ACRPS Qatar

Mahmoud Alhirthani
CHSS, HBKU Qatar

Mutaz Al-Khatib
CIS, HBKU Qatar

Logan Cochrane
CPP, HBKU Qatar

Kareem Darwish
QCRI, HBKU Qatar

Rashid Yahiaoui
CHSS, HBKU Qatar

Firoj Alam
QCRI, HBKU Qatar

Contact: fialam@hbku.edu.qa

Citation

@article{bhatia2026islamic,
  title={Advances in AI Systems on Islamic Knowledge Capabilities: A Critical Survey},
  author={Bhatia, Gagan and Mubarak, Hamdy and Hawasly, Majd and Jarrar, Mustafa and 
          Mikros, George and Zaraket, Fadi and Alhirthani, Mahmoud and Al-Khatib, Mutaz and 
          Cochrane, Logan and Darwish, Kareem and Yahiaoui, Rashid and Alam, Firoj},
  year={2026},
  institution={Qatar Computing Research Institute, HBKU}
}