First production prompt system
Before
Built a chatbot using OpenAI's API for customer support.
After
Shipped a Claude Sonnet 4.6-powered customer support agent serving ~85K conversations/month; designed a 120-prompt golden dataset and LLM-as-judge eval gating each prompt revision (3 regressions caught pre-deploy over 4 months).
Why this works: Names the model and version, gives monthly conversation volume, surfaces eval discipline with specific dataset size, and gives a measurable outcome from the eval.

