Skip to content

Perf issue: High longtail response times with KeepAlive=true when running with(or without) workers #3487

@schneems

Description

@schneems

Describe the bug

When using ab with -k or using hey (a go based a/b network testing library) I see very high long tail response times that is an order of magnitude of what I would expect.

AB without keepalive:

$  ab -c 10 -n 200 127.0.0.1:3000/
...
Percentage of the requests served within a certain time (ms)
  50%    420
  66%    421
  75%    422
  80%    422
  90%    423
  95%    424
  98%    426
  99%    428
 100%    432 (longest request)

Longest request took 432ms

With keepalive:

$  ab -c 10 -n 200 -k 127.0.0.1:3000/
...
Percentage of the requests served within a certain time (ms)
  50%    205
  66%    205
  75%    206
  80%    206
  90%   1852
  95%   2258
  98%   2273
  99%   2450
 100%   2477 (longest request)

Longest request took 2,477ms (or 5.7x longer than without keepalive).

Puma config:

$ cat Gemfile
source "https://rubygems.org"

gem "puma"
gem "rack"
⛄️ 3.3.1 🚀 /private/tmp/e3f9bf2d7f5e861b6de95d7e48a1b5b0
$ cat config.ru
require 'rack'

response = ['200', {'Content-Type' => 'text/html'}, ['Hello, World!']]
app = Proc.new do |env|
  sleep 0.2
  response
end

run app
$ bundle exec puma -t5 -p 3000
Puma starting in single mode...
* Puma version: 6.4.2 (ruby 3.3.1-p55) ("The Eagle of Durango")
*  Min threads: 5
*  Max threads: 5
*  Environment: development
*          PID: 7434
* Listening on http://0.0.0.0:3000
Use Ctrl-C to stop

To Reproduce

  • Use the above example app that sleeps for 200ms
  • Boot with bundle exec puma -t5 -p 3000
  • Hit with AB and keepalive enabled or hey
$ hey -c 10 -n 200 http://localhost:3000/

Summary:
  Total:	8.3877 secs
  Slowest:	2.4628 secs
  Fastest:	0.2005 secs
  Average:	0.3612 secs
  Requests/sec:	23.8445

  Total data:	2600 bytes
  Size/request:	13 bytes

Response time histogram:
  0.201 [1]	|
  0.427 [184]	|■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  0.653 [0]	|
  0.879 [0]	|
  1.105 [0]	|
  1.332 [0]	|
  1.558 [0]	|
  1.784 [0]	|
  2.010 [0]	|
  2.237 [0]	|
  2.463 [15]	|■■■

Extra observations

  • Enabling workers makes the problem go away
  • Disabling keepalive makes the problem go away

In server.rb the codepaths diverge

puma/lib/puma/server.rb

Lines 467 to 469 in 796d8c6

case handle_request(client, requests + 1)
when false
break
and that could be an okay place to look. Also within prepare_response() in lib/puma/request.rb. The return value of true indicates keepalive and false indicates no keepalive.

Edit (extra): you can also use existing middleware in the puma directory puma -t5 -p3000 test/rackup/sleep.ru and hit the sleep0.2 endpoint to achieve roughly the same behavior.

Edit (extra): From within the puma directory you can run:

$ bundle exec ruby -Ilib bin/puma -t5  -p 3000 test/rackup/sleep.ru

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions