Anthropic's extended thinking streaming protocol emits two delta types
for a single thinking block: `thinking_delta` (the textual reasoning)
and `signature_delta` (a base64 HMAC tag appended at the end of the
block). Both must be accumulated into the same `content_block`.
Current code only handles `thinking_delta`, so `signature_delta` events
are silently dropped. When the assistant's reply (with thinking) is
echoed back on the next turn, Anthropic's server validates the
signature and rejects the request with 400:
"Invalid `signature` in `thinking` block"
Downstream effects observed in production (via sub2api relay logs):
- Every request with history triggers a 400 signature error
- The relay strips thinking blocks and retries, which changes the
cache prefix and invalidates prompt caching, forcing a full rebuild
of cache_creation_tokens (~20k-30k per affected request)
- Measured in a 5h window: 5/25 requests suffered cache invalidation,
accounting for 53.5% of total spend that was otherwise avoidable
Fix:
1. Initialize `current_block` with an empty `signature` field when a
thinking block starts, so the dict shape matches Anthropic's spec
(`{type, thinking, signature}`).
2. Handle `signature_delta` events by appending `delta.signature` to
`current_block["signature"]`. Using `+=` (rather than assignment)
mirrors how `thinking_delta` is accumulated and is robust against
future chunked signatures.
No behavior change for clients that disable extended thinking, or for
upstreams that don't emit `signature_delta`. For `tool_use` threads
that require valid thinking signatures to preserve reasoning context,
this fix is required — the previous behavior silently corrupted them.
Verification:
- Before fix: upstream returns 400 + retry; cache_creation_tokens
spike to ~25k on every 4th-5th request in a conversation
- After fix: upstream accepts the first attempt; cache_read_tokens
dominate, cache_creation_tokens only appear on the first request
of a fresh 5m prompt-cache window
- Replace 3 old QR codes (group 5/6/7) with 4 new ones (group 6/8/9/10)
- Update both English and Chinese sections in README.md
- Remove obsolete wechat_group5.jpg and wechat_group7.jpg
- Add wechat_group8.jpg, wechat_group9.jpg, wechat_group10.jpg
compress_session.py Phase 4 builds to_del from both processed files
and every skipped file. Phase 1 marks files younger than 2h as
'recent(<2h)' (line 175) so their still-active writer is not
interrupted, but Phase 4 then deletes them anyway.
Filter the skipped-loop by reason so 'recent' files are preserved.
Processed files and other skip reasons (dup, compression error)
still proceed through deletion as before.