Building a Product with AI Only: What I Learned (and Where It Broke)

Apr 19

Written By Xi Liu

Late last year, with AI agent demos flooding the internet and every conversation seemingly touching on job displacement, I got curious: what does this mean for the way everyone working in Tech? Generative AI agent was already an essential part in my daily work, so I decided to run the experiment a few folks on X already tried:

To build and ship a product using only AI — no team, no full-stack engineer, just me and a few tools.

As a former solo founder, the idea of being a one-person product team was intruiging. And this time, I wanted to test how far AI could take me and where it might break.

First, what to build?

Around that time, I had just emerged from a long wellness recovery period. Naturally, my thoughts landed on self-care practices. After a quick exercise of breaking down user profiles and pain points, I framed a simple problem space and target users:

Target Users

Young professionals or students facing emotional fatigue, major life transitions, or overwhelming schedules — seeking low-effort ways to feel better without overhauling their routines.

Pain Points

Limited free time — wanting purposeful, no-prep activities
Repetitive routines — craving novelty without disruption
Mental health maintenance — needing consistent, lightweight support

The Solution

A simple wellness challenge app — CarpeDiem. Every day, the users receive a small task — it’s as simple as ‘take a long walk,’ ‘cook a new dish,’ or ‘meditate for 10 minutes’ — just something fun and remind you to be present. It’s designed to be quick and easy to complete within a day.

Besides the fact I was personally interested in wellness challenges, I picked the idea for with a few other product/technical considerations:

The UX flow can be very simple but include a few key components: content viewing → simple interaction → data summary
It required a backend but nothing too complex beyond my experience
The challenge refreshing mechanism could be interesting to tweak: Should challenges be time-based or completion-based? Personalized or shared? Like should all users see the same challenge on a given day, or have a set of challenges showing based on their time on the platform?

MVP

I then picked the MVP scope with three core features:

A new challenge refreshes daily at 12:00 PM
Users can mark a challenge complete (or unmark it)
Completed challenges are stored and shown in a profile view

How the App looks like as a final result

The Stack and Process

I started the project like any other: scoping goals, listing features, and sketching a simple product plan. Then I moved into build mode:

Stage 1: Ideation & Framework (ChatGPT)

Polished high-level requirements with ChatGPT.
Asked ChatGPT to walk me through setting up a project step by step in Xcode.
Gave requirements to ChatGPT, my new AI engineer to build the basic framework, card refreshing mechanism and user interactions.

Stage 2: Backend & Auth Setup (ChatGPT)

Worked with ChatGPT to choose a backend solution, and walk me through step by step to set up a project in Firebase
Built user auth, database storage, and challenge history features

Stage 3: Core Functionality Enhancements (Cursor)

Once the backend is set up, the project had become relevantly too big for ChatGPT. So I moved to Cursor for easier debugging and context retention
Addressed slow cloud loading by adding local caching and error handling
Added push notifications for daily refreshes

Stage 4: UI Update and bug fixes (Cursor)

Used Cursor’s inline suggestions to polish visuals and fix UI-level issues

Stage 5: TestFlight Beta (ChatGPT)

Deployed the MVP using TestFlight with ChatGPT’s help
Invited friends for feedback and identified critical UX bugs (including a dark mode fail)

Stage 6: Post-Beta Improvements (Claude)

Switched to Claude to experiment with code preview and large-context editing
Added a new challenge feed feature with swipe navigation
Fixed bugs and redeployed — but started seeing limits in model memory and communication quality

As the App Framework became a little more complicated, the AI agents fixes noticeably slowed down and the communications would have frictions here and there. A common issue is when we spent a few hours fixing one particular issue, the AI agent might ‘forget’ other things we built before and accidentally break other features. So after a full fixes, I deployed the updates again for testing and decided to pause here for now.

What Worked Well

Speed: I had a working MVP in a matter of days. As someone with a UX background, I’ve always been quick at prototyping — but AI tools accelerated that process dramatically.
Communication: The AI agents were surprisingly collaborative. They understood high-level product requirements, helped prioritize features, and responded iteratively.
Cursor became my favorite tool of all. It made debugging feel approachable. It explained errors, located code lines, and offered multiple fix paths — like pairing with an engineer who also teaches.

What Got Messy

As the project grew in complexity, so did the friction.

Performance issues, such as slow loading, tab-switching glitches, and disappearing profile data, were complex to resolve. Despite multiple fixes, AI agents struggled with performance tuning—a task that would have been second nature to a seasoned engineer.
Memory limitations meant that when I focused too long on one part of the app, the agent would “forget” other working components and accidentally break them.
Prompt sensitivity was a recurring theme. Even minor language differences (e.g. using “profile” vs. “user”) would cause misinterpretations. And unlike humans, AI rarely asked clarifying questions — it just did what it thought you meant.
Figma-to-code was clunky. I tried to hand-spec UI components in detail, but responsive behavior and layout precision weren’t consistent across screen sizes.

As the app framework grew more complex, the AI agents started to slow down, and communication became less reliable.

A common pattern emerged: after spending hours resolving one issue, the agent would often “forget” previous features we had already built — sometimes unintentionally breaking existing functionality.

After implementing a full round of fixes and deploying another test version, I decided it was a good moment to pause and reassess before continuing.

My Big Takeaways?

When building with an AI agent, you still need to do the fundamentals as if you were building with a full-stack engineering team. Define scopes, feature roadmaps, version tracking, clear Q&A requirements etc. It’s not as simple as ‘detailed prompts -> a working product’.

It’s a game changer for founders, product managers, and UX designers — it fundamentally shortens the prototyping process and makes user testing so much easier and faster. I can’t wait to see how the industry would become in another 12 months.

Final Thoughts

The app I built is fully functional and fun to test, but not up to my standards for shipping.

OverallI was pleasantly impressed by this experiment and it got me to think about the future of work:

What kind of skills do I need to develop to stay ahead?
If I ever build a company again how would I structure my process and team?
What would the future look like for product managers, UX designers and software engineers?

AI radically accelerated a product development speed. It let me design, code, and test a working product on my own. But it didn’t eliminate the need for thoughtful planning — at least not yet!

More details about this project here.

Medium Link.

AIAI Product ManagementAI APP DEVELOPMENT

Xi Liu