(speaking/writing scoring)
Phase 1 (Launch)
Reading & Listening: fully automated scoring using objective item scoring and fixed published score to band tables (instant results).
Writing & Speaking: manual marking using detailed rubrics; results within short turnaround.
Phase 2 (Hybrid automation)
Integrate speech-to-text (option: OpenAI Whisper / commercial ASR) for candidate audio → human + AI assist (raters see transcripts + AI feedback).
Automated writing analysis (Natural Language Processing (NLP) pipelines): grammar & mechanics checks, lexical sophistication metrics, cohesion/coherence indicators — used to produce a second opinion for raters.
Phase 3 (Full automation & continuous calibration)
Train and validate (Machine Learning) ML models (supervised on human-rated items) to produce stable, defensible scores.
Use ensemble approaches (ML models + rule-based checks + GPT-style evaluators for language features), with human quality checks and drift monitoring.