How Our AI Corrects Arabic Pronunciation in Real Time

Amal uses dual-layer AI speech recognition — combining on-device speech-to-text for instant feedback with Google Cloud Speech-to-Text for higher-accuracy pronunciation scoring. The system is specifically tuned for children's voices reading Arabic, including full diacritical mark (tashkeel) awareness. No other Arabic learning app offers real-time pronunciation correction for kids.

The Problem We Solved

Arabic has 28 letters but over 100 sounds when you include diacritics (fatha, damma, kasra, shadda, sukun, tanween). Children's voices have fundamentally different acoustic properties than adults — higher pitch, less articulation, and variable volume. Existing speech-to-text models, even Google's advanced offerings, weren't trained on children reading Arabic with full diacritical marks.

Most apps either skip pronunciation feedback entirely or use simple waveform matching that penalizes accents and natural variation. Neither approach works for children learning a language with sounds that don't exist in English.

How It Works: Dual STT Architecture

Our system runs two simultaneous speech recognition paths:

Layer 1 — Device STT (Instant Feedback) The DeviceSTTMechanism uses Flutter's native speech recognition to process audio locally. As your child speaks, partial results stream back instantly — showing green highlights for recognized words with zero latency. This keeps children engaged and provides immediate reinforcement. Device STT works offline and requires no internet connection.

Layer 2 — Backend Google STT (Accuracy) Simultaneously, we send the audio to BackendGoogleSTTMechanism, which uses Google Cloud Speech-to-Text with speech context biasing. We send the expected text (the word the child is supposed to be reading) as a hint. This dramatically improves recognition accuracy for Arabic words in context — the STT "knows" to listen for specific phonemes.

Layer	Latency	Accuracy	Offline	Use Case
Device STT	~100ms	70%	✓	Real-time WIP display
Cloud STT	~500ms	92%	✗	Final scoring
Combined	500ms	95%	Partial	Best user experience

Similarity Scoring, Not Binary Matching

We don't check if your child's pronunciation is "exactly right" — we score it on a spectrum using string similarity with a 0.7 threshold. This allows for:

Accent variation: Children from different Arabic-speaking regions naturally pronounce differently
Childish articulation: Young children mispronounce sounds that improve with practice
Diacritic awareness: "كَتَبَ" (with diacritics) vs "كتب" (without) are treated differently in our recognition context

A child might score 85% on their first try, 91% on the second, and 97% after practice. They see progressive improvement, not discouraging binary pass/fail.

Speech Context Biasing: The Secret Ingredient

When a lesson asks your child to read "بِسْمِ اللَّهِ" (In the name of Allah), we send this text to Google STT as a speech context. The STT engine biases toward those specific phonemes, improving recognition accuracy by 35-50% for expected words.

This is critical for Arabic because:

Words have multiple valid pronunciations depending on diacritization
Context disambiguates meaning
Children benefit from the system "knowing" what they're supposed to read

Why Competitors Can't Copy This

Reproducing this requires:

Children's voice acoustic training data (we have 95,000+ learners)
Arabic diacritical awareness in speech processing (specialized NLP)
Curriculum integration (context biasing tied to each lesson)
Mobile architecture expertise (dual STT without UI lag)
Years of iteration with real children's voices

It's not a feature you add — it's a system you build from the ground up.

FAQ

Q: Does Amal work with different Arabic accents? A: Yes. Our similarity scoring accommodates dialectal variation. Whether your child has a Gulf, Levantine, or Egyptian accent, the system adjusts and scores pronunciation on intelligibility, not conformity to a single standard.

Q: Does my child need internet for speech recognition? A: Device STT works completely offline for instant feedback. For highest accuracy (and spaced repetition scheduling), cloud STT works best with internet, but the app gracefully falls back to device-only mode.

Q: Is my child's voice data stored? A: No. Audio is processed in real-time and immediately discarded. We never store children's voice recordings. Speech results are logged (for learning analytics) but not the audio itself.

See how Amal works, why we use device and cloud speech recognition, and why lip-sync matters for Arabic sounds.

How Our AI Corrects Arabic Pronunciation in Real Time

The Problem We Solved

How It Works: Dual STT Architecture

Similarity Scoring, Not Binary Matching

Speech Context Biasing: The Secret Ingredient

Why Competitors Can't Copy This

FAQ

Related Articles

Device STT vs Cloud STT for Children's Speech Recognition

How Thurayya's AI Tajweed Engine Helps Kids Recite Better

Why We Built Lip-Sync Animation for Arabic Sounds

The Problem We Solved

How It Works: Dual STT Architecture

Similarity Scoring, Not Binary Matching

Speech Context Biasing: The Secret Ingredient

Why Competitors Can't Copy This

FAQ

Related reading

Related Articles

Device STT vs Cloud STT for Children's Speech Recognition

How Thurayya's AI Tajweed Engine Helps Kids Recite Better

Why We Built Lip-Sync Animation for Arabic Sounds