
Two days before a family gift exchange, I decided to run a slightly unhinged experiment.
Could I ship a real, production-ready Secret Santa app in roughly a day and a half… on time for a family draw to be held on Zoom between all members of my family?
I inserted a hard rule to the challenge: That I would not type or edit a single character of application code. So is AI Agentic coding Good enough today to ship a full end-to-end App to be used by real people to their satisfaction and get them to pay for that experience? The answer is a resounding YES. No “but”, no caveat, no tradeoff, no workaround.
I proudly present to you https://secretsantadraw.ai/
The SecretSanta repo is the result of that bet. It’s a complete full-stack Secret Santa draw experience: animated funnel UI, draw logic that understands couples, paywall where fees apply, email and SMS dispatch, secured using Google reCAPTCHA, health checks and continued deployment wiring. All of it was exclusively built through 37 prompts written from scratch by myself leading to pull requests also authored by AI and all code was solely managed at all time by AI Coding agents, powered primarily by ChatGPT 5.1 running inside a Codex environment, with GitHub Copilot and a few prompts made to Claude-Code Sonnet 4.5 as supporting players. Every code change went through an AI-generated PR; In other words, my roles were purposefully limited and contained to product owner, reviewer, and merge clicker—not coder.
Not a single line of application code in this project was hand-edited by me.
The experiment: Codex as an AI-only software factory
Codex, in this context, is my AI development cockpit: a place where ChatGPT and the 5.1 model drives the edits, Copilot fills in the micro-patterns, and Claude-Code handled a few refactors required for two features hitting a dead end and some second opinions. The goal wasn’t just to “use AI a bit more”; it was to see if the stack was mature enough to:
- Design and implement a front-end funnel from scratch
- Stand up a Node/Express backend with non-trivial business logic
- Wire in email delivery, environment variables, and security controls like reCAPTCHA
- Produce something I would actually trust to use with real friends and family
All of that orchestration was done using Web Interfaces, CLI and Visual Studio Code and happened within 36 hours, and every meaningful change needed to land via an AI-authored pull request with me as the accountable human in the loop only for intent, accountability, feedback, and merge decisions.
A quick tour of the app we shipped
The finished app is a compact two-tier architecture. A React front-end built with Vite powers a playful, three-step guided funnel, and a Node.js/Express backend handles the Secret Santa draw and email dispatch. The repository is structured around frontend/ and backend/ folders, each with their own Node projects, with the root providing shared configuration. (GitHub)
On the front-end, the experience walks the organizer through naming the event, picking the date, and entering four couples, with validation and cheerful animations to make the setup feel like a mini product journey rather than a boring form. The draw itself is visual: animated cards spin and shuffle with snow and confetti-inspired effects as the system computes assignments. (GitHub)
On the back-end, a draw endpoint implements a derangement algorithm that guarantees no one draws themselves or their spouse, and then dispatches individualized emails informing each participant who they’re buying for. The email layer defaults to a JSON transport that logs payloads to the console for local testing, but can be switched to real SMTP just by configuring environment variables like SMTP_HOST, SMTP_PORT, SMTP_USER, and FROM_EMAIL in the backend .env. (GitHub)
The app is mobile-first and accessibility-aware: headings use a playful display font while the body keeps to a readable sans-serif, with high-contrast gradients and focus outlines designed to be usable on small screens and desktops alike. Animations combine CSS keyframes with Framer Motion for smooth transitions without overwhelming the page. (GitHub)
A Google reCAPTCHA integration protects the event setup form from automated submissions. The frontend expects a VITE_RECAPTCHA_SITE_KEY, and the backend verifies with a matching secret key. The repo ships with .env.example files in both tiers so the intended configuration is crystal clear. (GitHub)
Underneath all that, Node.js 18+ and npm 9+ form the runtime baseline, which kept the tooling consistent for the AI agents and ensured that generated code compiled and ran cleanly across the stack. (GitHub)
All of these technology and component choices were made by AI, I had nothing to do with the selections that were made for frontend and backend implementations. This is not a toy “Hello World” populated by AI. It is a real, multi-screen, stateful web app with a genuine business constraint (couples), non-trivial combinatorics (derangements), and a production-oriented backend (emails, health checks, secrets configuration).
Breaking down the components (without me touching the code)
The front-end funnel was the first major component the agents tackled. I started with a natural language brief in Codex: I described the idea of orchestrating a Secret Santa for individuals and couples, the need for a three-step flow, and the vibe I wanted: festive, modern, and lightweight. ChatGPT 5.1 responded by scaffolding a Vite+React application, defining routes and components for each step of the funnel, and wiring up basic state management.
Subsequent pull requests deepened the UX. I asked for form validation with friendly error messages and guardrails to ensure four couples were entered with names and emails that looked sane. Another PR wove in Framer Motion and CSS animations to make transitions between steps feel like the user was going further down a playful tunnel rather than clicking through static screens. When I requested accessibility considerations—focus states, contrast, and keyboard navigability—the agents generated another PR to refine styling and semantics instead of leaving those as “future work.”
Parallel to that, a second stream of work built the backend draw service. I described the constraint: this wasn’t just “shuffle a list.” It had to ensure that no one draws themselves or their spouse, and it needed to scale beyond a fixed number of participants. ChatGPT 5.1 synthesized a draw algorithm that produced valid derangements while understanding couple relationships, then exposed it over an Express API endpoint that accepted the exact payload shape used by the front-end. Another PR layered on a /health endpoint and better logging so the service could be monitored once deployed.
The email pipeline came next. I asked for a solution that would be safe in development but practical in production. The agents chose a pattern where the default transport is JSON logging—ideal for local testing—while environment variables unlock SMTP for real-world usage. They generated .env.example files to document the configuration and updated the README so future users could copy/paste their way to a running system. I didn’t write those instructions; I just kept asking, “Could a stranger understand how to run this in five minutes?” and pushed Codex and Claude-Code for clearer documentation until the answer felt like yes.
On top of that, I asked the agents to wire a Stripe-powered paywall in front of the results delivery experience, just to see if they could handle a real-world monetization layer end to end. From the backend integration that talks to Stripe using API keys stored in environment variables, to the small UI entry point that routes eligible users through the payment flow and back into the app, every piece of that paywall was also authored via AI-generated pull requests. I never hand-edited a line; I only described the behavior I wanted (“gate this behind a simple Stripe paywall so it’s ready for real traffic”) and let Codex, Copilot, and Claude-Code negotiate the implementation details. It even turned on the “Link” feature so users can pay directly from their mobile phone.
Finally, infrastructure and guardrails were wired in. A reCAPTCHA integration was added to the funnel form, with the frontend using the site key and the backend validating the token using the secret. The .do directory and supporting scripts were created to align with deployment to DigitalOcean’s App Platform, reflecting a real target environment rather than a purely local prototype. Even the curl example in the README for testing /api/draw was AI-generated, based on my request for “a copy-pasteable command I can run from a terminal to validate the draw logic.”
Every one of those changes—front-end scaffolding, backend services, style tweaks, payment, reCAPTCHA, documentation, and deployment spec—came into the repo as a pull request authored by an AI agent. My keyboard stayed out of the code.
The workflow: PR-driven development with three AI “engineers”
What made this feel different from ordinary “AI-assisted coding” was the insistence on a pull-request-only workflow.
Codex, backed by ChatGPT 5.1, played the lead engineer. I would open a ticket-like prompt—“Add an animated draw screen that reveals assignments with snow and confetti vibes”—and Codex would respond with a branch name, a plan, and the actual diff. Copilot sat closer to the metal, filling in low-level code patterns and inline suggestions once a direction existed. Claude-Code shined when I asked for broader refactors (“clean up this component tree for readability” or “extract the draw logic into a separate module with tests”), generating larger but coherent changesets that Codex then integrated.
Instead of editing files directly, I behaved like a tech lead in a busy open-source project. I read diffs, left comments in natural language, and asked the agents to update their branches. If a PR felt too big, I asked them to split it. If naming felt unclear, I told them so and got a follow-up PR focused only on naming and comments. The feedback loop was tight: prompt → PR → human review → revision prompt → revised PR → merge.
That constraint—that no human fingers touched the code—forced the AI to shoulder the entire burden of correctness, consistency, and style. It also surfaced the rough edges of modern AI coding tools in a very honest way.
What made this experiment genuinely challenging
The time constraint was the most obvious challenge. Building a full-stack app in 1.5 days is tight even with experienced human engineers, especially when you add requirements like email dispatch, reCAPTCHA, responsive UX, and deployment configuration. But in this case, the bigger constraint was the rule that I would not manually “fix” anything in code. If the app broke, I had to explain the problem in language and trust the agents to repair it.
The derangement algorithm and couples logic were a good example. It’s easy for an AI—or a human—to write a naive shuffle that accidentally assigns someone to themselves or their partner. When I spotted edge cases by reasoning through small test payloads, I had to describe the failure scenarios rather than just patch the function. That forced the AI to adjust data structures and algorithmic approach until the guarantees held.
Orchestration across three AI systems added another layer of difficulty. ChatGPT 5.1 inside Codex was superb at holding the architectural big picture, but GitHub Copilot occasionally introduced subtle deviations when auto-completing code in a branch. Claude-Code sometimes refactored aggressively enough to undo stylistic conventions established earlier. Keeping them all “aligned” required me to act as a strong product owner and code reviewer, continuously reinforcing guardrails: consistent component structure, predictable file layout, clear naming.
The non-functional requirements were just as demanding. Accessibility isn’t something you get for free by asking “make it accessible.” I needed to push for specific details: focus outlines, skip-to-content behavior, colour contrast that works on mobile in bright environments. Security features like reCAPTCHA and environment-based email configuration required the agents to thread the needle between developer convenience and safe defaults, and to document those decisions clearly for future readers.
Perhaps the most surprising challenge was emotional: resisting the urge to “just fix it myself.” Years of engineering muscle memory kick in when you see a one-line bug, a mis-named variable, or an awkwardly structured component tree. In this experiment, every such itch became another prompt instead. Over time that constraint shifted my mental model from “I am writing code” to “I am running a small AI engineering team.”
What this says about the future of building software
The SecretSanta repo is a small project in the grand scheme of things, but it’s a sharp proof point.
ChatGPT 5.1, when given the right environment and paired with tools like Codex, Copilot, and Claude-Code, is already capable of owning an entire full-stack implementation: from UX and animation to backend logic, email flows, configuration, and documentation. It doesn’t remove the need for human judgment; if anything, it amplifies it. The role of the human shifts from typing to steering, from syntax to product thinking.
In about 36 hours, that effort produced a polished Secret Santa experience with real “bells and whistles,” not a demo slide. The repo stands as a snapshot of what AI-driven, PR-first development can look like today: humans setting direction and standards, AI agents doing the heavy lifting in code, and GitHub serving as the collaboration fabric between them.
If you’re curious what that future feels like, you don’t have to imagine it—you can clone it.
Leave a reply to Cris Williams Cancel reply