Alphazed uses automated AI pipelines to generate and curate over 10,000 educational content items — including Arabic vocabulary exercises, pronunciation drills, Quran memorization sequences, and interactive stories. The pipeline combines OpenAI for text generation, Google Cloud TTS for audio, custom image generators, and human quality gates to produce curriculum-aligned content at scale.
The Content Generation Stack
Text Generation
- OpenAI GPT-4o-mini: Generates exercise prompts, distractors, story scripts, Quranic interpretations
- Prompt engineering: Highly specific prompts ensure output aligns with Bloom's Taxonomy levels
- Example prompt: "Generate 5 plausible distractors for the Arabic word 'كتاب' (book). Distractors must be semantically related but clearly different. Level: Intermediate learner, age 6-8."
Audio Generation
- Google Cloud TTS (WaveNet voices): Generates native-quality Arabic audio
- Multiple voices: Male/female voices, varying speaking speeds, emotional tones
- Custom pronunciation: Diacritical marks influence phoneme selection for authentic Quranic pronunciation
- Speech marks extraction: Phoneme timestamps for lip-sync animations (blog #3)
Advanced Audio
- ElevenLabs for multilingual voiceovers: Marketing videos, app intro sequences
- Music composition: Royalty-free background tracks from Epidemic Sound
Image Generation
- Custom distractor images via DALL-E or Midjourney
- Vector graphics for UI elements
- Character illustrations for story content
The Pipeline Architecture
Repository: alphazed-content-utils (Python, 20+ generator modules)
Generators (independent, composable):
├── amal_level_generator.py
│ └── Generates complete Arabic learning levels
│ (letters → words → sentences progression)
│
├── prophet_story_generator.py
│ └── Multi-modal stories for Thurayya
│ (text + illustrations + audio narration)
│
├── quran_tafseer_generator.py
│ └── Quranic interpretation content
│ (per-surah, per-ayah explanations)
│
├── distractor_generator.py
│ └── Smart wrong answers for multiple-choice
│ (semantic similarity matching)
│
├── exercise_generator.py
│ └── Interactive exercises (45+ types)
│ (selecting exercise type from catalog)
│
└── image_generator.py
└── Visual content (DALL-E or Midjourney API)
Each generator follows a standard flow:
[Load config] → [Generate] → [Validate] → [Insert to DB]
Generator Deep-Dive: Distractor Generation
The Problem For multiple-choice exercises, wrong answers (distractors) must be:
- Plausible (child doesn't immediately recognize as wrong)
- Related (semantically or phonetically similar)
- Clearly different (child can distinguish with thought)
Bad distractors:
- Question: "Which word means book?"
- Wrong: "Elephant", "Blue", "Happy" ← Too obviously wrong
Good distractors:
- Question: "Which word means book?"
- Options: "كتاب" (book), "كاتب" (writer), "مكتب" (office), "كتاب" (plural) ← Semantically related, requires thought
Implementation (distractor_generator.py)
-
Semantic similarity matching:
- Compute embeddings for correct answer using Arabic word embeddings
- Find words with high similarity (0.7-0.85 range)
- Exclude words that are too similar (exact synonyms)
-
Phonetic similarity:
- For letters/sounds, match based on phonetic features
- Example: "ب" (Ba) distractors: "ت" (Ta), "ث" (Tha) — sound families
-
Weighted selection:
- Match difficulty level of the exercise
- Beginner exercises get very different distractors
- Advanced exercises get subtle distractors
Quality Assurance: Human + AI Gates
Automated Validation
- Grammar check: Arabic morphological analysis
- Diacritical marks: Verify tashkeel accuracy
- Character set: Ensure no encoding errors
- Content duplication: Flag identical items
Mandatory Human Review
- Quran/Tajweed content: Checked by Islamic scholar (volunteer)
- Kids safety: Scanned by LLM for inappropriate language
- Cultural sensitivity: Reviewed for potential offense
- Accuracy: Spot-check samples (10% of generated content)
No-Fallback Policy If any validation fails, the pipeline stops and alerts via Slack. Errors never silently slip into production.
Generated Content Categories
| Category | Volume | Generator | QA Gate | Launch |
|---|---|---|---|---|
| Arabic vocabulary | 5,000+ items | exercise_gen | Automated | Week 1 |
| Quran surahs | 200+ (37 × 5-7 stages) | tafseer_gen | Scholar review | Week 2 |
| Prophet stories | 50+ | prophet_story_gen | Cultural + safety review | Week 3 |
| Phoneme pronunciation | 100+ (28 letters × 3-4 variants) | audio_gen | Audio engineer review | Week 1 |
| Interactive games | 45+ types × 1,000+ instances | game_content_gen | Gameplay testing | Ongoing |
| Total | 10,000+ | Multiple | Layered | Phased |
Cost & Efficiency
Cost per content item (including AI + human review):
- Simple vocabulary exercise: $0.05-0.10
- Quran surah (full 4 stages): $5-10 (due to scholar review)
- Story content: $1-2
Average cost per 1,000 items: $300-500
Manual content creation would cost $5,000-10,000 per 1,000 items. AI pipelines reduce cost by 10x while increasing volume and consistency.
Why This Matters
Competitors can't match this because:
- Scale: 10,000 items requires infrastructure investment
- Arabic expertise: Distractor generation for Arabic is specialized
- Quran sensitivity: Scholar review gates take time/trust
- Continuous refresh: Our pipeline generates new content weekly
FAQ
Q: Is AI-generated content as good as human-created? A: For exercise generation, yes — often better. Humans get tired; AI is consistent. For Quran interpretation, human scholars must review. For stories, we use AI + human polish. The optimal mix depends on content type.
Q: Do children notice they're using AI-generated content? A: No. The content is indistinguishable. What matters is accuracy (validated) and relevance (curriculum-aligned), not authorship.
Q: How do you prevent the pipeline from generating errors? A: No-fallback policy: if anything fails validation, the batch stops and alerts. We'd rather have 99% validated content than 100% with potential errors. Humans review all Quran content regardless.
Related reading
See Bloom's Taxonomy-driven lesson design and Content Duo's personalized session builder.


