Three AI models. One prompt. Every recipe baked by hand — then judged blind.
ChatGPT posted its highest composite score ever. Gemini won the room anyway. Someone put basil in a sugar cookie and two neighbors found it — and both chose it as their favorite.
Everyone is using, afraid to use, or talking about AI. I rely on it for work and became more curious about its capability as I dabbled in using it to develop baking recipes. I wondered if it could help me capture the flavors and concepts I was trying to make shine.
What I learned is this: sometimes. Sometimes it could hit the mark and make something incredibly tasty. Sometimes it was a train wreck, and the more it tried to "help," the worse the recipe got.
I wondered if that was the status quo for all AIs, or just the platform I was using. AI Made. Human Verified. is my attempt to find out which AI is the most reliable baking partner.
I give three models the same recipe brief, bake everything faithfully, and let a blind panel of real people decide who wins. The kitchen is honest and the tasters don't know which AI is which. The one that performs best gets the credit.
I'm a baker based in Washington state. Not a developer, not a researcher, not anything other than a curious baker with strong opinions about cookies and an itch to find out which AI is the best baking buddy.
"AI is writing recipes. I'm making sure someone actually bakes them." — The premise of the show
Every episode starts with a prompt. The same prompt goes to Claude, ChatGPT, and Gemini — with no additional coaching, no refinement, no asking it to "try again." Whatever comes back is what gets made.
The Human Verification Panel — a rotating group of real people — tastes the results without knowing which is which. They score on flavor, texture, appearance, and overall preference. The scores are tallied. The winner is announced. The AI doesn't get a participation trophy.
This is food content and AI content for people who are skeptical of both — and curious about the intersection.
The HVP is the heart of the show. A structured, blind tasting process designed to remove bias, enforce fairness, and produce a result that actually means something.
A single recipe brief is written — specific enough to produce a real recipe, open enough to reveal each AI's instincts. The same prompt, word for word, is submitted to Claude, ChatGPT, and Gemini in the same session. No iteration. No regeneration. First response only.
Each recipe is baked exactly as written. If the AI says two tablespoons, it's two tablespoons. If the instructions are unclear, that ambiguity is documented — not corrected. The goal is equal execution, not equal results.
Each bake is assigned a neutral label (A, B, C) with no identifying information. Panelists receive the samples simultaneously with no knowledge of which AI produced which recipe.
Each panelist completes a ballot scoring on four criteria: flavor, texture, appearance, and overall preference. Written comments are collected. Scores are tallied after all ballots are submitted.
Results are revealed on camera. The winning AI is announced. The margin of victory, the score breakdown, and panelist commentary are all shared — wins, losses, and surprises included.
Every recipe featured on the show — including the AI-generated originals and occasional HVP-approved adaptations — available to bake at home.
For the first time, a human recipe enters the competition alongside the AIs. New recipes drop with every episode.
Subscribe on YouTube ↗I partner with brands whose products I actually use in my kitchen. If your product would genuinely make it into my pantry, let's talk. No sponsored opinions, no undisclosed placements.
Media inquiries, podcast appearances, and creative collaborations welcome. I'm happy to talk about the show, the methodology, or the surprisingly strong feelings people have about AI-generated recipes.
Want to eat cookies and tell me what you think? The panel is always looking for new tasters. Local to the Pacific Northwest preferred, but reach out regardless.
Full data breakdowns for every episode — scores, taster language, baker's percentages, and everything the composite averages are hiding.