Amal uses dual-layer AI speech recognition — combining on-device speech-to-text for instant feedback with Google Cloud Speech-to-Text for higher-accuracy pronunciation scoring. The system is specifically tuned for children's voices reading Arabic, including full diacritical mark (tashkeel) awareness. No other Arabic learning app offers real-time pronunciation correction for kids.
The Problem We Solved
Arabic has 28 letters but over 100 sounds when you include diacritics (fatha, damma, kasra, shadda, sukun, tanween). Children's voices have fundamentally different acoustic properties than adults — higher pitch, less articulation, and variable volume. Existing speech-to-text models, even Google's advanced offerings, weren't trained on children reading Arabic with full diacritical marks.
Most apps either skip pronunciation feedback entirely or use simple waveform matching that penalizes accents and natural variation. Neither approach works for children learning a language with sounds that don't exist in English.
How It Works: Dual STT Architecture
Our system runs two simultaneous speech recognition paths:
Layer 1 — Device STT (Instant Feedback)
The DeviceSTTMechanism uses Flutter's native speech recognition to process audio locally. As your child speaks, partial results stream back instantly — showing green highlights for recognized words with zero latency. This keeps children engaged and provides immediate reinforcement. Device STT works offline and requires no internet connection.
Layer 2 — Backend Google STT (Accuracy)
Simultaneously, we send the audio to BackendGoogleSTTMechanism, which uses Google Cloud Speech-to-Text with speech context biasing. We send the expected text (the word the child is supposed to be reading) as a hint. This dramatically improves recognition accuracy for Arabic words in context — the STT "knows" to listen for specific phonemes.
| Layer | Latency | Accuracy | Offline | Use Case |
|---|---|---|---|---|
| Device STT | ~100ms | 70% | ✓ | Real-time WIP display |
| Cloud STT | ~500ms | 92% | ✗ | Final scoring |
| Combined | 500ms | 95% | Partial | Best user experience |
Similarity Scoring, Not Binary Matching
We don't check if your child's pronunciation is "exactly right" — we score it on a spectrum using string similarity with a 0.7 threshold. This allows for:
- Accent variation: Children from different Arabic-speaking regions naturally pronounce differently
- Childish articulation: Young children mispronounce sounds that improve with practice
- Diacritic awareness: "كَتَبَ" (with diacritics) vs "كتب" (without) are treated differently in our recognition context
A child might score 85% on their first try, 91% on the second, and 97% after practice. They see progressive improvement, not discouraging binary pass/fail.
Speech Context Biasing: The Secret Ingredient
When a lesson asks your child to read "بِسْمِ اللَّهِ" (In the name of Allah), we send this text to Google STT as a speech context. The STT engine biases toward those specific phonemes, improving recognition accuracy by 35-50% for expected words.
This is critical for Arabic because:
- Words have multiple valid pronunciations depending on diacritization
- Context disambiguates meaning
- Children benefit from the system "knowing" what they're supposed to read
Why Competitors Can't Copy This
Reproducing this requires:
- Children's voice acoustic training data (we have 95,000+ learners)
- Arabic diacritical awareness in speech processing (specialized NLP)
- Curriculum integration (context biasing tied to each lesson)
- Mobile architecture expertise (dual STT without UI lag)
- Years of iteration with real children's voices
It's not a feature you add — it's a system you build from the ground up.
FAQ
Q: Does Amal work with different Arabic accents? A: Yes. Our similarity scoring accommodates dialectal variation. Whether your child has a Gulf, Levantine, or Egyptian accent, the system adjusts and scores pronunciation on intelligibility, not conformity to a single standard.
Q: Does my child need internet for speech recognition? A: Device STT works completely offline for instant feedback. For highest accuracy (and spaced repetition scheduling), cloud STT works best with internet, but the app gracefully falls back to device-only mode.
Q: Is my child's voice data stored? A: No. Audio is processed in real-time and immediately discarded. We never store children's voice recordings. Speech results are logged (for learning analytics) but not the audio itself.
Related reading
See how Amal works, why we use device and cloud speech recognition, and why lip-sync matters for Arabic sounds.



