Skip to content

Increase start_server thread join timeout#1020

Closed
k0kubun wants to merge 1 commit intoruby:masterfrom
k0kubun:fix-start-server-timeout
Closed

Increase start_server thread join timeout#1020
k0kubun wants to merge 1 commit intoruby:masterfrom
k0kubun:fix-start-server-timeout

Conversation

@k0kubun
Copy link
Copy Markdown
Member

@k0kubun k0kubun commented Mar 27, 2026

The 30-second base timeout can be insufficient on slow CI runners, causing spurious failures in tests like test_server_session_cache. Double the base timeout to 60 seconds to reduce flakiness.

Example failure: https://github.com/ruby/ruby/actions/runs/23631288760/job/68830977060?pr=16576

The 30-second base timeout can be insufficient on slow CI runners,
causing spurious failures in tests like test_server_session_cache.
Double the base timeout to 60 seconds to reduce flakiness.

Example failure: https://github.com/ruby/ruby/actions/runs/23631288760/job/68830977060?pr=16576
@k0kubun k0kubun marked this pull request as ready for review March 27, 2026 06:07
@rhenium
Copy link
Copy Markdown
Member

rhenium commented Mar 27, 2026

Do you see this happening often? This particular test takes 0.04 seconds on the GitHub-provided runner for ruby/openssl, and I wouldn't expect it to become 700x slower even with high parallelism normally.

https://github.com/ruby/openssl/actions/runs/23633478701/job/68837745603?pr=1020

 test_server_session_cache:			.: (0.039300)

@rhenium
Copy link
Copy Markdown
Member

rhenium commented Mar 27, 2026

    1) Failure:
  OpenSSL::TestSSLSession#test_server_session_cache [/Users/runner/work/ruby/ruby/src/test/openssl/utils.rb:277]:
  exceptions on 1 threads:
  #<Thread:0x0000000128403480 /Users/runner/work/ruby/ruby/src/test/openssl/utils.rb:251 dead>:
  /Users/runner/work/ruby/ruby/build/.ext/common/openssl/buffering.rb:76:in 'OpenSSL::SSL::SSLSocket#sysread': [start_server] thread did not exit in 30 secs (RuntimeError)
  	from /Users/runner/work/ruby/ruby/build/.ext/common/openssl/buffering.rb:76:in 'OpenSSL::Buffering#fill_rbuff'
  	from /Users/runner/work/ruby/ruby/build/.ext/common/openssl/buffering.rb:238:in 'OpenSSL::Buffering#gets'
  	from /Users/runner/work/ruby/ruby/src/test/openssl/test_ssl_session.rb:207:in 'block (3 levels) in OpenSSL::TestSSLSession#test_server_session_cache'
  	from /Users/runner/work/ruby/ruby/src/test/openssl/test_ssl_session.rb:436:in 'OpenSSL::TestSSLSession#server_connect_with_session'
  	from /Users/runner/work/ruby/ruby/src/test/openssl/test_ssl_session.rb:206:in 'block (2 levels) in OpenSSL::TestSSLSession#test_server_session_cache'
  	from /Users/runner/work/ruby/ruby/src/test/openssl/test_ssl_session.rb:202:in 'Integer#times'
  	from /Users/runner/work/ruby/ruby/src/test/openssl/test_ssl_session.rb:202:in 'block in OpenSSL::TestSSLSession#test_server_session_cache'
  	from /Users/runner/work/ruby/ruby/src/test/openssl/utils.rb:255:in 'block (2 levels) in OpenSSL::SSLTestCase#start_server'

It seems either #sysread decided to wait for the underlying socket when it shouldn't have, or rb_io_wait() in #sysread didn't wake up, but I have no idea how it happened. I doubt doubling timeout would help in either case.

@k0kubun
Copy link
Copy Markdown
Member Author

k0kubun commented Mar 27, 2026

It's a RUBY_DEBUG=1 build (slower interpreter) + --zjit-call-threshold=1 (compile everything with ZJIT) build, so it's not just about parallelism.

However, we've just fixed ZJIT-specific flakiness in an openssl test ruby/ruby#16576 ruby/ruby#16585, and I'm wondering if this might have been just a case that's caused by the same problem but with a different symptom. So, I actually want to let it sit without this to see if it's true. Let me close this until I see this again.

Thank you for your comment.

@k0kubun k0kubun closed this Mar 27, 2026
@k0kubun k0kubun deleted the fix-start-server-timeout branch March 27, 2026 21:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants