k6 load testing: your VU numbers are lying

We needed to figure out how many concurrent users ECHO could handle before things break. Wrote a k6 script, ramped up VUs, watched the system die around 100-250 VUs.

First reaction: we can only handle 100-250 concurrent users? That’s terrible. We have events with 500+ participants.

I was wrong.

what the script actually did

Our k6 script simulated a real user session: initiate a conversation, upload 4 audio chunks with 30-second gaps between each, then finish. We used shared-iterations as the executor:

export const options = {
  scenarios: {
    default: {
      executor: 'shared-iterations',
      vus: Number(__ENV.VUS || 10),
      iterations: Number(__ENV.ITERATIONS || 10),
      maxDuration: '20m',
      gracefulStop: '30s',
    },
  },
}

Each VU runs the full session: upload chunk 0, sleep 30 seconds, upload chunk 1, sleep 30 seconds, etc. Total session duration: 90+ seconds per user.

the math

When your k6 script includes sleep(30) between chunks, each VU is already modeling real user behavior. The VU isn’t hammering your API continuously. It’s making 6 API calls spread over 90 seconds, exactly like a real user would.

So if your system dies at 100-250 VUs running this script, it can’t handle 100-250 concurrent real user sessions. That’s the actual number.

I initially tried to convert VUs to real users using some ratio. “If each VU generates X requests per second without sleep, and a real user generates Y…” But that only applies if your VUs are running tight loops with no sleep. Our script already had realistic timing built in.

actual capacity

With the 30-second sleep between chunks, our death number of 100-250 VUs translates directly to 100-250 concurrent real users doing audio chunk uploads.

For an audio processing API, that’s actually reasonable. The bottleneck is audio processing load: webm files being uploaded, database writes for each chunk, any real-time transcription processing. Heavy operations.

what we learned

Always model realistic user behavior in your load tests. A tight loop with no sleep tells you your theoretical throughput ceiling but not your practical concurrent user capacity. Include real-world delays.

Know what your VU configuration actually measures. With shared-iterations and realistic sleep patterns, VU count roughly equals concurrent user count. With no sleep and continuous iteration, you need conversion math.

Identify your actual bottleneck. For us it’s audio I/O and processing, not API throughput. Optimizing our HTTP handler wouldn’t meaningfully change the number. We needed to look at audio processing efficiency, database write patterns, and file storage I/O.

We got comfortable with our capacity target for the events we’re running. The scary “100-250” number was exactly what we needed. Not a crisis, just a constraint to plan around.