The new format: what you are actually facing
The TOEFL 2026 Speaking section is the shortest and most practically focused section in the new exam. Total time: approximately 8 to 10 minutes. It comes last in the test, after Reading, Listening, and Writing. There are 11 scored items across two task types, and there is zero preparation time for either of them.
Listen and Repeat
- 7 sentences, one at a time
- 8 to 12 seconds to respond after beep
- Sentences get progressively longer
- Topic: campus or everyday situation
- Scored on: Accuracy, Fluency, Intelligibility
- Goal: exact repetition, not paraphrasing
Take an Interview
- 4 questions on one everyday topic
- 45 seconds per response
- No preparation time at all
- Questions progress from simple to opinion-based
- Scored on: Fluency, Intelligibility, Language Use, Organization
- Goal: natural, organized, spontaneous speech
According to the official ETS scoring framework, your final Speaking band is the average of two task scores — one for Listen and Repeat (averaging your 7 item scores) and one for Take an Interview (averaging your 4 item scores) — rounded to the nearest 0.5 on the 1.0 to 6.0 band scale. Neither task is weighted more heavily than the other in the final calculation, which means a weak performance on Listen and Repeat costs you just as much as a weak performance on the Interview.
Listen and Repeat: what it actually tests
Listen and Repeat is simpler in concept than the Interview, but it catches more students off guard. The task is straightforward: hear a sentence, wait for the beep, repeat it exactly. No paraphrasing, no summarising, no expressing your own ideas. The goal is precise auditory reproduction.
According to the official ETS rubric, a perfect score of 5 requires the response to be fully intelligible and an exact repetition of the prompt. A single meaningful error drops you to a 4. Missing content or changed meaning takes you to 2 or 3. The rubric rewards precision over creativity.
The seven sentences follow a progression in difficulty. The first two are short and simple — approximately 8 to 10 words. The middle three are medium length with more content words and clauses. The final two are the longest and most complex, often containing subordinate clauses, technical vocabulary, or sequences of steps. The long sentences are genuinely difficult, even for students with strong English. This is not a section to underestimate.
What actually trips students up
The most common error is not pronunciation — it is auditory memory. Students hear a sentence, process the meaning, and then speak from their understanding rather than from the precise words they heard. This works well in normal conversation but loses you points here, because the rubric scores repetition accuracy, not comprehension.
The fix is dedicated shadowing practice: listening to short audio clips and repeating immediately, word for word, matching the speaker's rhythm, stress, and intonation. Daily shadowing of 10 to 15 minutes builds the auditory memory muscle this task requires far more effectively than vocabulary study or pronunciation drills alone.
Self-correction is permitted. If you realise mid-sentence that you got a word wrong, go back and fix it. ETS allows this without penalty. If you completely lose a word, make your best guess and keep moving — stopping entirely and going silent costs more than an imperfect attempt.
Take an Interview: understanding the task
The Interview simulates a short online conversation with a researcher. You see the interviewer on screen in a short looping video, which creates a more natural conversational feel than reading a prompt from a page. The four questions all relate to one everyday topic — common themes include city life and daily routines, commuting and transportation, technology habits, personal preferences about learning or working, and opinions on social or community issues.
The questions follow a pattern that moves from concrete to abstract:
Question 1 typically asks about your current situation or personal experience. ("Do you currently live in a city, a small town, or a village?") This is the easiest question and your opportunity to settle in.
Question 2 asks about your habits or preferences related to the topic. ("How often do you use public transport?") Still factual, but starting to require more development.
Question 3 asks for your opinion or evaluation. ("Do you think cities are becoming easier or harder to live in?") This is where language complexity matters most.
Question 4 asks you to consider a broader perspective or hypothetical. ("What do you think governments should do to improve city life for young people?") The hardest question, requiring organized reasoning under time pressure with no preparation.
You hear each question once and must begin speaking immediately. There is no preparation time and no note-taking. You have 45 seconds per response.
How your Interview responses are scored
Every Interview response is scored on the 0 to 5 scale using four dimensions. Understanding these dimensions is the most important preparation step, because they tell you exactly what the ETS AI scoring engine is evaluating every time you speak.
| Score | What it looks like |
|---|---|
| 5 | Fluent and clear throughout. Directly answers the question with a well-organized response. Varied vocabulary and grammar. Maintains natural pace of 140 to 160 words per minute. Fills close to the full 45 seconds. |
| 4 | Clear and relevant. Minor errors in grammar or vocabulary that do not obscure meaning. May lack connectors or full development. Generally good pace with small disruptions. |
| 3 | Understandable but choppy. Frequent pauses or reduced pace. Limited development of ideas. Some grammar errors that affect clarity. Relevancy is present but response may feel thin. |
| 2 | Significant fluency problems. Meaning is sometimes unclear. Limited range of vocabulary and grammar. Response does not fully address the question. |
| 1 | Largely unintelligible or very brief. Major problems with pronunciation, vocabulary, or grammar throughout. Response barely engages with the prompt. |
| 0 | No response or entirely off topic. |
The four scoring constructs that generate this score are Fluency (steady pace, minimal unnatural pauses, smooth delivery), Intelligibility (clear pronunciation, word stress, and rhythm), Language Use (accurate and varied grammar and vocabulary), and Organization including Relevancy (directly answering the question with a clear, logical structure).
The strategy that actually works
Because there is no preparation time, everything you do in the 45 seconds depends on what is already automatic in your spoken English. That is the core challenge of the TOEFL 2026 Speaking section — and the reason most students who try to prepare by memorizing templates fail to improve past a certain point.
Use a simple, flexible structure — not a memorized template
The most reliable structure for all four Interview questions: state your answer directly in the first 5 to 7 seconds, give one or two reasons with a brief example in the next 25 to 30 seconds, and wrap up in the final 5 seconds by restating your main point. This gives the AI scoring engine clear Organisation and Relevancy without sounding scripted. Adapt the language to each question rather than filling in blanks from a memorized script.
Target 140 to 160 words per minute
This is the speaking pace associated with natural, fluent English. Too slow sounds hesitant and scripted. Too fast creates intelligibility problems. Record yourself answering opinion questions and count your words. If you are consistently under 120 words per minute, pace is your primary target. If you are over 175, slow down. At 45 seconds per response, a band 5.0 answer is typically 100 to 120 words.
Answer the question that was asked, not the question you prepared for
Relevancy is a scored dimension. The AI scoring engine compares your response to the specific question asked and penalizes answers that drift off topic or answer a different question. When you hear each question, take half a second to identify exactly what is being asked before you start speaking. Question 4 in particular often surprises students because it shifts from personal experience to broader societal opinion — make sure you track that shift.
Keep moving — never stop mid-response
Silence is the most costly error in speaking. A long pause signals fluency problems to the scoring engine far more than a grammatical error does. If you lose your train of thought, use a cue phrase to buy yourself a moment: "Let me think about that for a second" or "Another way to look at it is..." These keep your speech moving while you organize your next idea. Imperfect but continuous speech scores better than perfect speech with silences.
Use transitions to demonstrate Organisation
The scoring engine evaluates Organisation partly through the presence of logical connectors. Phrases like "the main reason is," "for example," "additionally," and "so overall" signal structure to both the AI and human raters. You do not need elaborate academic transitions — simple, correctly used connectors consistently score better than sophisticated language used awkwardly. Build a small set of go-to transition phrases and use them naturally.
Fill the full 45 seconds
Very short responses almost never score above 3. The scoring rubric explicitly requires sufficient development of ideas, and a response that ends at 20 seconds does not give the engine enough data to score Language Use or Organisation reliably. If you have finished your main point with time remaining, extend your example, add a second reason, or connect your answer to a broader implication. Practice finishing close to 45 seconds consistently.
Sample question and response
Here is an example of what a Question 3 level Interview question looks like, and what a band 4.5 to 5.0 response sounds like in practice:
"Do you think living in a big city makes it easier or harder to meet new people? Why?"
Personally, I think it makes it easier, even though it might not feel that way at first. The main reason is that cities offer so many different kinds of places where people naturally come together — classes, sports clubs, community events, coffee shops. You are constantly around people with different backgrounds, which makes conversations happen more organically. For example, in my experience, I have met people through a language exchange group that I never would have found in a smaller town. Of course, cities can feel anonymous too, but I think that is more about how you choose to use the space than about the city itself. So overall, I would say big cities actually give you more opportunities to connect, if you make the effort.
Notice what this response does: it answers the question directly in the first sentence, gives a reason with a concrete mechanism, adds a personal example, acknowledges the counterargument briefly, and wraps up. It uses common vocabulary correctly. It does not try to impress with difficult words. It fills close to the full 45 seconds.
The five most common mistakes
-
Using memorized templates The AI scoring engine detects generic, rehearsed phrasing and robotic delivery. Templates produce lower scores than natural, adapted responses. Learn a flexible structure, not a fixed script.
-
Neglecting Listen and Repeat Students treat this as easy and spend all their preparation time on the Interview. Listen and Repeat accounts for half the Speaking score. Shadowing practice is the most effective preparation and most students do almost none of it.
-
Stopping when they lose their thread Silence is more damaging to your Fluency score than any grammatical error. Keep speaking, use a filler phrase, and find your way back to the topic. The engine needs continuous speech data to score you accurately and favorably.
-
Finishing too early A 20-second response to a 45-second question is a structural problem, not a fluency problem. Practice extending your answers until finishing close to 45 seconds feels natural. Add a second example, a contrasting view, or a forward-looking statement to use your time.
-
Practising reading, not speaking Reading TOEFL speaking guides and looking at sample responses does not build speaking ability. The only preparation that works is speaking — out loud, on a timer, with feedback. Record yourself. Listen back. Identify your specific patterns. Improve them.
How to prepare effectively
The TOEFL 2026 Speaking section rewards habits built over weeks, not knowledge acquired the night before. Here is how to structure your preparation.
Daily speaking practice (15 to 20 minutes)
Every day, answer five opinion questions on a timer. Use topics similar to Interview questions — city life, technology, education, work habits, social issues. Record yourself. Time your responses. Listen back and identify your specific weaknesses: Are you finishing too early? Are your transitions missing? Are you speaking too slowly? Are you drifting off topic on harder questions? The feedback loop matters as much as the practice itself.
Daily shadowing (10 to 15 minutes)
Pick any clear audio source — a news podcast, a short documentary clip, a TED Talk excerpt — and shadow it: listen and repeat immediately after the speaker, matching their rhythm, stress, and intonation word for word. This builds the auditory memory and natural rhythm that Listen and Repeat directly tests. Ten minutes of focused shadowing daily produces measurable improvement within two weeks.
AI-powered feedback
Practising alone is useful. Practising with feedback is dramatically more effective. Our TOEFL 2026 Speaking practice at toefl.prepdrills.com grades your Speaking responses using AI feedback on fluency, pronunciation, and content — so you know exactly what the scoring engine sees, not just how you feel about your own responses. It is available any time, free to start, and designed specifically for the 2026 format including both Listen and Repeat and Take an Interview tasks.
Work with a teacher for the final push
AI feedback catches the patterns your score reports reveal. A good Speaking teacher catches the patterns you cannot hear yourself — habitual pronunciation errors, logic gaps in your arguments, filler phrases you are not aware of using. For students targeting band 5.0 or above, at least a few sessions with a certified TOEFL teacher significantly accelerates the final stage of improvement. Epic Exam Prep offers one-to-one TOEFL Speaking preparation with teachers who specialize in exactly this section and know the 2026 rubrics in detail.
What band do you need?
Most competitive universities in the US and UK require an overall TOEFL 2026 band of 4.5, which corresponds to a Speaking section band of approximately 4.0 or above. Programs that involve teaching, presenting, or significant oral communication — including teaching assistant roles, law, medicine, and some business programs — often set Speaking minimums of 4.5 or higher.
A Speaking band of 5.0 is roughly equivalent to 26 to 28 on the old 0 to 30 section scale. A band of 4.0 corresponds approximately to 22 to 24 on the old scale. Always verify the specific Speaking minimum for your target program, as institutions are still updating their requirements for the 2026 format.
Practice TOEFL 2026 Speaking with AI feedback
Both Listen and Repeat and Take an Interview — with AI grading on fluency, pronunciation, and content. Free to start.
Start Practising Free →