TOEFL 2026 Speaking

TOEFL 2026 Speaking: How to Ace Take an Interview
Complete Guide

What changed: on January 21, 2026, ETS replaced the entire TOEFL Speaking section. The old four-task format — with its integrated passages, reading texts, and 15 to 30 seconds of preparation time — is gone. In its place are two completely new task types: Listen and Repeat, and Take an Interview. No templates. No prep time. No integrated content. Just you, speaking spontaneously, scored by AI in real time. This guide covers everything you need to know to score at band 5.0 and above.

The new format: what you are actually facing

The TOEFL 2026 Speaking section is the shortest and most practically focused section in the new exam. Total time: approximately 8 to 10 minutes. It comes last in the test, after Reading, Listening, and Writing. There are 11 scored items across two task types, and there is zero preparation time for either of them.

Task 1

Listen and Repeat

  • 7 sentences, one at a time
  • 8 to 12 seconds to respond after beep
  • Sentences get progressively longer
  • Topic: campus or everyday situation
  • Scored on: Accuracy, Fluency, Intelligibility
  • Goal: exact repetition, not paraphrasing
Task 2

Take an Interview

  • 4 questions on one everyday topic
  • 45 seconds per response
  • No preparation time at all
  • Questions progress from simple to opinion-based
  • Scored on: Fluency, Intelligibility, Language Use, Organization
  • Goal: natural, organized, spontaneous speech

According to the official ETS scoring framework, your final Speaking band is the average of two task scores — one for Listen and Repeat (averaging your 7 item scores) and one for Take an Interview (averaging your 4 item scores) — rounded to the nearest 0.5 on the 1.0 to 6.0 band scale. Neither task is weighted more heavily than the other in the final calculation, which means a weak performance on Listen and Repeat costs you just as much as a weak performance on the Interview.

Why this matters for your preparation Most students instinctively focus all their preparation time on the Interview because it feels more familiar — answering questions is something they have done before. But Listen and Repeat counts for exactly the same amount. Students who neglect it often plateau at band 4.0 even when their conversational English is strong. Balance your preparation between both tasks from day one.

Listen and Repeat: what it actually tests

Listen and Repeat is simpler in concept than the Interview, but it catches more students off guard. The task is straightforward: hear a sentence, wait for the beep, repeat it exactly. No paraphrasing, no summarising, no expressing your own ideas. The goal is precise auditory reproduction.

According to the official ETS rubric, a perfect score of 5 requires the response to be fully intelligible and an exact repetition of the prompt. A single meaningful error drops you to a 4. Missing content or changed meaning takes you to 2 or 3. The rubric rewards precision over creativity.

The seven sentences follow a progression in difficulty. The first two are short and simple — approximately 8 to 10 words. The middle three are medium length with more content words and clauses. The final two are the longest and most complex, often containing subordinate clauses, technical vocabulary, or sequences of steps. The long sentences are genuinely difficult, even for students with strong English. This is not a section to underestimate.

What actually trips students up

The most common error is not pronunciation — it is auditory memory. Students hear a sentence, process the meaning, and then speak from their understanding rather than from the precise words they heard. This works well in normal conversation but loses you points here, because the rubric scores repetition accuracy, not comprehension.

The fix is dedicated shadowing practice: listening to short audio clips and repeating immediately, word for word, matching the speaker's rhythm, stress, and intonation. Daily shadowing of 10 to 15 minutes builds the auditory memory muscle this task requires far more effectively than vocabulary study or pronunciation drills alone.

Self-correction is permitted. If you realise mid-sentence that you got a word wrong, go back and fix it. ETS allows this without penalty. If you completely lose a word, make your best guess and keep moving — stopping entirely and going silent costs more than an imperfect attempt.


Take an Interview: understanding the task

The Interview simulates a short online conversation with a researcher. You see the interviewer on screen in a short looping video, which creates a more natural conversational feel than reading a prompt from a page. The four questions all relate to one everyday topic — common themes include city life and daily routines, commuting and transportation, technology habits, personal preferences about learning or working, and opinions on social or community issues.

The questions follow a pattern that moves from concrete to abstract:

Question 1 typically asks about your current situation or personal experience. ("Do you currently live in a city, a small town, or a village?") This is the easiest question and your opportunity to settle in.

Question 2 asks about your habits or preferences related to the topic. ("How often do you use public transport?") Still factual, but starting to require more development.

Question 3 asks for your opinion or evaluation. ("Do you think cities are becoming easier or harder to live in?") This is where language complexity matters most.

Question 4 asks you to consider a broader perspective or hypothetical. ("What do you think governments should do to improve city life for young people?") The hardest question, requiring organized reasoning under time pressure with no preparation.

You hear each question once and must begin speaking immediately. There is no preparation time and no note-taking. You have 45 seconds per response.


How your Interview responses are scored

Every Interview response is scored on the 0 to 5 scale using four dimensions. Understanding these dimensions is the most important preparation step, because they tell you exactly what the ETS AI scoring engine is evaluating every time you speak.

Score What it looks like
5 Fluent and clear throughout. Directly answers the question with a well-organized response. Varied vocabulary and grammar. Maintains natural pace of 140 to 160 words per minute. Fills close to the full 45 seconds.
4 Clear and relevant. Minor errors in grammar or vocabulary that do not obscure meaning. May lack connectors or full development. Generally good pace with small disruptions.
3 Understandable but choppy. Frequent pauses or reduced pace. Limited development of ideas. Some grammar errors that affect clarity. Relevancy is present but response may feel thin.
2 Significant fluency problems. Meaning is sometimes unclear. Limited range of vocabulary and grammar. Response does not fully address the question.
1 Largely unintelligible or very brief. Major problems with pronunciation, vocabulary, or grammar throughout. Response barely engages with the prompt.
0 No response or entirely off topic.

The four scoring constructs that generate this score are Fluency (steady pace, minimal unnatural pauses, smooth delivery), Intelligibility (clear pronunciation, word stress, and rhythm), Language Use (accurate and varied grammar and vocabulary), and Organization including Relevancy (directly answering the question with a clear, logical structure).

The biggest misconception about scoring Many students believe that using advanced vocabulary and complex grammar structures will earn a higher score. This is only partly true. The ETS AI scoring engine is specifically trained to penalize responses where advanced vocabulary is used incorrectly or where complex structures create clarity problems. A response using common words correctly, in a clear organized structure, at a natural pace, outscores a response full of impressive words used inaccurately. Clarity beats complexity every time.

The strategy that actually works

Because there is no preparation time, everything you do in the 45 seconds depends on what is already automatic in your spoken English. That is the core challenge of the TOEFL 2026 Speaking section — and the reason most students who try to prepare by memorizing templates fail to improve past a certain point.

1

Use a simple, flexible structure — not a memorized template

The most reliable structure for all four Interview questions: state your answer directly in the first 5 to 7 seconds, give one or two reasons with a brief example in the next 25 to 30 seconds, and wrap up in the final 5 seconds by restating your main point. This gives the AI scoring engine clear Organisation and Relevancy without sounding scripted. Adapt the language to each question rather than filling in blanks from a memorized script.

2

Target 140 to 160 words per minute

This is the speaking pace associated with natural, fluent English. Too slow sounds hesitant and scripted. Too fast creates intelligibility problems. Record yourself answering opinion questions and count your words. If you are consistently under 120 words per minute, pace is your primary target. If you are over 175, slow down. At 45 seconds per response, a band 5.0 answer is typically 100 to 120 words.

3

Answer the question that was asked, not the question you prepared for

Relevancy is a scored dimension. The AI scoring engine compares your response to the specific question asked and penalizes answers that drift off topic or answer a different question. When you hear each question, take half a second to identify exactly what is being asked before you start speaking. Question 4 in particular often surprises students because it shifts from personal experience to broader societal opinion — make sure you track that shift.

4

Keep moving — never stop mid-response

Silence is the most costly error in speaking. A long pause signals fluency problems to the scoring engine far more than a grammatical error does. If you lose your train of thought, use a cue phrase to buy yourself a moment: "Let me think about that for a second" or "Another way to look at it is..." These keep your speech moving while you organize your next idea. Imperfect but continuous speech scores better than perfect speech with silences.

5

Use transitions to demonstrate Organisation

The scoring engine evaluates Organisation partly through the presence of logical connectors. Phrases like "the main reason is," "for example," "additionally," and "so overall" signal structure to both the AI and human raters. You do not need elaborate academic transitions — simple, correctly used connectors consistently score better than sophisticated language used awkwardly. Build a small set of go-to transition phrases and use them naturally.

6

Fill the full 45 seconds

Very short responses almost never score above 3. The scoring rubric explicitly requires sufficient development of ideas, and a response that ends at 20 seconds does not give the engine enough data to score Language Use or Organisation reliably. If you have finished your main point with time remaining, extend your example, add a second reason, or connect your answer to a broader implication. Practice finishing close to 45 seconds consistently.


Sample question and response

Here is an example of what a Question 3 level Interview question looks like, and what a band 4.5 to 5.0 response sounds like in practice:

Sample Interview Question 3

"Do you think living in a big city makes it easier or harder to meet new people? Why?"

Personally, I think it makes it easier, even though it might not feel that way at first. The main reason is that cities offer so many different kinds of places where people naturally come together — classes, sports clubs, community events, coffee shops. You are constantly around people with different backgrounds, which makes conversations happen more organically. For example, in my experience, I have met people through a language exchange group that I never would have found in a smaller town. Of course, cities can feel anonymous too, but I think that is more about how you choose to use the space than about the city itself. So overall, I would say big cities actually give you more opportunities to connect, if you make the effort.

~43 seconds ~115 words Clear structure Natural pace

Notice what this response does: it answers the question directly in the first sentence, gives a reason with a concrete mechanism, adds a personal example, acknowledges the counterargument briefly, and wraps up. It uses common vocabulary correctly. It does not try to impress with difficult words. It fills close to the full 45 seconds.


The five most common mistakes


How to prepare effectively

The TOEFL 2026 Speaking section rewards habits built over weeks, not knowledge acquired the night before. Here is how to structure your preparation.

Daily speaking practice (15 to 20 minutes)

Every day, answer five opinion questions on a timer. Use topics similar to Interview questions — city life, technology, education, work habits, social issues. Record yourself. Time your responses. Listen back and identify your specific weaknesses: Are you finishing too early? Are your transitions missing? Are you speaking too slowly? Are you drifting off topic on harder questions? The feedback loop matters as much as the practice itself.

Daily shadowing (10 to 15 minutes)

Pick any clear audio source — a news podcast, a short documentary clip, a TED Talk excerpt — and shadow it: listen and repeat immediately after the speaker, matching their rhythm, stress, and intonation word for word. This builds the auditory memory and natural rhythm that Listen and Repeat directly tests. Ten minutes of focused shadowing daily produces measurable improvement within two weeks.

AI-powered feedback

Practising alone is useful. Practising with feedback is dramatically more effective. Our TOEFL 2026 Speaking practice at toefl.prepdrills.com grades your Speaking responses using AI feedback on fluency, pronunciation, and content — so you know exactly what the scoring engine sees, not just how you feel about your own responses. It is available any time, free to start, and designed specifically for the 2026 format including both Listen and Repeat and Take an Interview tasks.

Work with a teacher for the final push

AI feedback catches the patterns your score reports reveal. A good Speaking teacher catches the patterns you cannot hear yourself — habitual pronunciation errors, logic gaps in your arguments, filler phrases you are not aware of using. For students targeting band 5.0 or above, at least a few sessions with a certified TOEFL teacher significantly accelerates the final stage of improvement. Epic Exam Prep offers one-to-one TOEFL Speaking preparation with teachers who specialize in exactly this section and know the 2026 rubrics in detail.

The honest truth about Speaking improvement Speaking is the hardest TOEFL section to improve alone. Reading, Writing, and Listening can all be developed through structured self-study. Speaking requires you to produce language under pressure, in real time, and evaluate it accurately — which is very difficult to do without external feedback. Students who plateau at band 3.5 or 4.0 despite consistent self-study almost always unlock improvement when they get real feedback, either from AI scoring or from a teacher who can hear what they cannot. Do not spend six weeks practicing the wrong things in private. Get feedback early.

What band do you need?

Most competitive universities in the US and UK require an overall TOEFL 2026 band of 4.5, which corresponds to a Speaking section band of approximately 4.0 or above. Programs that involve teaching, presenting, or significant oral communication — including teaching assistant roles, law, medicine, and some business programs — often set Speaking minimums of 4.5 or higher.

A Speaking band of 5.0 is roughly equivalent to 26 to 28 on the old 0 to 30 section scale. A band of 4.0 corresponds approximately to 22 to 24 on the old scale. Always verify the specific Speaking minimum for your target program, as institutions are still updating their requirements for the 2026 format.

Practice TOEFL 2026 Speaking with AI feedback

Both Listen and Repeat and Take an Interview — with AI grading on fluency, pronunciation, and content. Free to start.

Start Practising Free →