Paul ARGOUD

Forum Replies Created

  • Author
    Posts
  • May 23, 2026 at 10:19 pm #6366

    Follow-up with empirical data on Proposal 2 — Hook 1

    Quick update: I tested the manual workaround (disabling streaming forces a code path that goes through the WordPress HTTP API instead of the direct cURL one). A companion plugin injecting Anthropic prompt caching markers worked cleanly on the first try.

    After 14 requests across Haiku 4.5 and Sonnet 4.6 in real use, the cumulative cache hit rate sits at 65 percent (Haiku alone: roughly 60, Sonnet: roughly 68). Effective input token cost dropped to about 35 to 40 percent of baseline. Steady state projects to 70 to 85 percent input cost reduction on typical multi-turn chats — most impactful on Sonnet and Opus where input pricing is several times Haiku.

    The trade-off today is binary: streaming and caching cannot currently coexist. The proposed filter resolves that — companion plugins could modify the request payload before dispatch while keeping streaming on. Pass-through by default, zero behaviour change for users without a companion plugin.

    Full technical proposal (unified diff against the integrator class, working consumer example, full numbers) lives in this Gist:

    https://gist.github.com/PaulArgoud/0f8cc1b455e27a679cc2b84445e8dc87

    Happy to send a pull request against main with the patch and a small test — just point me at the branch you prefer.

    PS : The MxChat chatbot is prompting visitors for a satisfaction rating even though the feature is disabled in the admin settings.