How We Generate 10,000+ Learning Items
5 min readMohammad Shaker

How We Generate 10,000+ Learning Items

Alphazed uses AI pipelines to generate 10,000+ educational items, pairing fast model output with audio tooling and human quality checks.

Engineering

Quick Answer

Alphazed uses AI pipelines to generate 10,000+ educational items, pairing fast model output with audio tooling and human quality checks.

Alphazed uses automated AI pipelines to generate and curate over 10,000 educational content items — including Arabic vocabulary exercises, pronunciation drills, Quran memorization sequences, and interactive stories. The pipeline combines OpenAI for text generation, Google Cloud TTS for audio, custom image generators, and human quality gates to produce curriculum-aligned content at scale.

The Content Generation Stack

Text Generation

  • OpenAI GPT-4o-mini: Generates exercise prompts, distractors, story scripts, Quranic interpretations
  • Prompt engineering: Highly specific prompts ensure output aligns with Bloom's Taxonomy levels
  • Example prompt: "Generate 5 plausible distractors for the Arabic word 'كتاب' (book). Distractors must be semantically related but clearly different. Level: Intermediate learner, age 6-8."

Audio Generation

  • Google Cloud TTS (WaveNet voices): Generates native-quality Arabic audio
  • Multiple voices: Male/female voices, varying speaking speeds, emotional tones
  • Custom pronunciation: Diacritical marks influence phoneme selection for authentic Quranic pronunciation
  • Speech marks extraction: Phoneme timestamps for lip-sync animations (blog #3)

Advanced Audio

  • ElevenLabs for multilingual voiceovers: Marketing videos, app intro sequences
  • Music composition: Royalty-free background tracks from Epidemic Sound

Image Generation

  • Custom distractor images via DALL-E or Midjourney
  • Vector graphics for UI elements
  • Character illustrations for story content

The Pipeline Architecture

Repository: alphazed-content-utils (Python, 20+ generator modules)

Generators (independent, composable):
  ├── amal_level_generator.py
  │   └── Generates complete Arabic learning levels
  │       (letters → words → sentences progression)
  │
  ├── prophet_story_generator.py
  │   └── Multi-modal stories for Thurayya
  │       (text + illustrations + audio narration)
  │
  ├── quran_tafseer_generator.py
  │   └── Quranic interpretation content
  │       (per-surah, per-ayah explanations)
  │
  ├── distractor_generator.py
  │   └── Smart wrong answers for multiple-choice
  │       (semantic similarity matching)
  │
  ├── exercise_generator.py
  │   └── Interactive exercises (45+ types)
  │       (selecting exercise type from catalog)
  │
  └── image_generator.py
      └── Visual content (DALL-E or Midjourney API)

Each generator follows a standard flow:

[Load config] → [Generate] → [Validate] → [Insert to DB]

Generator Deep-Dive: Distractor Generation

The Problem For multiple-choice exercises, wrong answers (distractors) must be:

  • Plausible (child doesn't immediately recognize as wrong)
  • Related (semantically or phonetically similar)
  • Clearly different (child can distinguish with thought)

Bad distractors:

  • Question: "Which word means book?"
  • Wrong: "Elephant", "Blue", "Happy" ← Too obviously wrong

Good distractors:

  • Question: "Which word means book?"
  • Options: "كتاب" (book), "كاتب" (writer), "مكتب" (office), "كتاب" (plural) ← Semantically related, requires thought

Implementation (distractor_generator.py)

  1. Semantic similarity matching:

    • Compute embeddings for correct answer using Arabic word embeddings
    • Find words with high similarity (0.7-0.85 range)
    • Exclude words that are too similar (exact synonyms)
  2. Phonetic similarity:

    • For letters/sounds, match based on phonetic features
    • Example: "ب" (Ba) distractors: "ت" (Ta), "ث" (Tha) — sound families
  3. Weighted selection:

    • Match difficulty level of the exercise
    • Beginner exercises get very different distractors
    • Advanced exercises get subtle distractors

Quality Assurance: Human + AI Gates

Automated Validation

  • Grammar check: Arabic morphological analysis
  • Diacritical marks: Verify tashkeel accuracy
  • Character set: Ensure no encoding errors
  • Content duplication: Flag identical items

Mandatory Human Review

  • Quran/Tajweed content: Checked by Islamic scholar (volunteer)
  • Kids safety: Scanned by LLM for inappropriate language
  • Cultural sensitivity: Reviewed for potential offense
  • Accuracy: Spot-check samples (10% of generated content)

No-Fallback Policy If any validation fails, the pipeline stops and alerts via Slack. Errors never silently slip into production.

Generated Content Categories

Category Volume Generator QA Gate Launch
Arabic vocabulary 5,000+ items exercise_gen Automated Week 1
Quran surahs 200+ (37 × 5-7 stages) tafseer_gen Scholar review Week 2
Prophet stories 50+ prophet_story_gen Cultural + safety review Week 3
Phoneme pronunciation 100+ (28 letters × 3-4 variants) audio_gen Audio engineer review Week 1
Interactive games 45+ types × 1,000+ instances game_content_gen Gameplay testing Ongoing
Total 10,000+ Multiple Layered Phased

Cost & Efficiency

Cost per content item (including AI + human review):

  • Simple vocabulary exercise: $0.05-0.10
  • Quran surah (full 4 stages): $5-10 (due to scholar review)
  • Story content: $1-2

Average cost per 1,000 items: $300-500

Manual content creation would cost $5,000-10,000 per 1,000 items. AI pipelines reduce cost by 10x while increasing volume and consistency.

Why This Matters

Competitors can't match this because:

  1. Scale: 10,000 items requires infrastructure investment
  2. Arabic expertise: Distractor generation for Arabic is specialized
  3. Quran sensitivity: Scholar review gates take time/trust
  4. Continuous refresh: Our pipeline generates new content weekly

FAQ

Q: Is AI-generated content as good as human-created? A: For exercise generation, yes — often better. Humans get tired; AI is consistent. For Quran interpretation, human scholars must review. For stories, we use AI + human polish. The optimal mix depends on content type.

Q: Do children notice they're using AI-generated content? A: No. The content is indistinguishable. What matters is accuracy (validated) and relevance (curriculum-aligned), not authorship.

Q: How do you prevent the pipeline from generating errors? A: No-fallback policy: if anything fails validation, the batch stops and alerts. We'd rather have 99% validated content than 100% with potential errors. Humans review all Quran content regardless.

See Bloom's Taxonomy-driven lesson design and Content Duo's personalized session builder.

Related Articles