Introduction
Mytravelwallet AI is a Generative AI-powered travel decision engine built to solve a problem every traveller knows but no tool has solved: is this fare actually a good deal, right now?
By combining real-time pricing data, a RAG-grounded conversational assistant, and a hallucination-resistant architecture, FareWise delivers a specific, trustworthy verdict in under 3 seconds; compressing 30 minutes or more of research into a single, confident answer.
Overview
The modern travel booking experience has a fundamental design flaw: search tools show hundreds of options but never tell you which one to pick. Users spend 30 to 45 minutes cross-referencing tabs, never quite certain whether the fare they are looking at is actually good or whether waiting a day might save them $80.
FareWise AI closes this gap with two connected surfaces built on a single shared RAG infrastructure. The semantic search engine lets users find flights in natural language โ€” "cheap morning flights to warm weather next month" โ€” without structured forms or filters.
Zoe, the GenAI assistant, answers direct pricing questions with a specific, grounded verdict: "28% below the 90-day average โ€” book now. Demand rising."
The most important architectural insight was that these two surfaces could share one data foundation of live pricing APIs, pgvector embeddings, and fare history while requiring entirely different trust architectures at the output layer. Getting that boundary right was the hardest product decision of the build.
"The LLM never calculates โ€” it only explains. Every number Zoe(AI Assistant) quotes is computed in code and handed to the model as verified context. This is what makes a GenAI financial recommendation trustworthy."
AI Hypothesis
If Mytravelwallet AI replaces multi-option search with a deterministic, AI-explained single verdict grounded in real-time data,
users will shift from exploration to action, trusting the system to make financially optimal decisions on their behalf,
which reduces cognitive load, increases booking conversion, and unlocks measurable economic value from idle reward points,
allowing Mytravelwallet AI to monetize via high-intent traffic, partner commissions, and subscription-based decision intelligence.

The product is not the search. The product is the decision.
Customer Research
Together with the team, we conducted moderated usability testing and qualitative interviews to understand how users make decisions when booking travel with points vs. cash.
We interviewed 3 prototype users and supplemented findings with broader behavioral insights from early-stage testing.
The focus was on:
  • Decision-making behavior
  • Trust in AI recommendations
  • Usability of the prototype
  • Mental model of rewards systems
Users donโ€™t struggle with finding options ; they struggle with trusting a decision.
Key Pain Points Identified
1
Decision paralysis due to too many options
Users are overwhelmed by multiple redemption paths and lack a clear recommendation.
2
Lack of trust in recommendation logic
Users want to understand why a verdict is given before acting on it.
3
Confusion around points systems
Users struggle to differentiate between airline miles vs. credit card points.
4
Poor mobile usability breaks the experience
Key actions like date selection and navigation failed on mobile devices.
5
Missing critical booking context
Users need flight details (timing, stops, duration) to make a decision.
Behavioral Insights
Users prefer one clear answer over multiple options
Transparency increases trust more than accuracy alone
Users are willing to act if confidence is high
Predictive insights (when to book) drive long-term engagement

Product Implication
Mytravelwallet AI should not be a search tool. It should be a decision engine that delivers one trustworthy verdict.
Customer Insights
Trust Gap
โ€œI get that itโ€™s telling me to use cash or points, but I donโ€™t know why. Without seeing how itโ€™s calculated, it feels like Iโ€™m just guessing again.โ€
Intermediate Rewards User, Frequent Traveler, 30โ€“40
Decision Overload
โ€œThere are too many options out there already. I donโ€™t want another tool that shows more โ€” I just want to know what I should do.โ€
Busy Professional, Multi-card Holder, 30โ€“50
Points Confusion
โ€œI have points, but Iโ€™ve never used them. I donโ€™t really understand how they work or which ones to use โ€” it just feels too complicated.โ€
Casual Traveler, Low Rewards Familiarity, 25โ€“45
Personas and Customer Journey
Based on user interviews with 40 people and prototype testing, we identified key customer segments within the travel rewards ecosystem. Our primary focus is on Busy Professionals โ€” high-value users with limited time, holding multiple travel credit cards and substantial points.
These users travel frequently (3-6 times/year) but spend 30-45 minutes comparing options, often defaulting to cash due to decision paralysis. FareWise aims to transform these complex travel choices into a single, trustworthy decision, reducing cognitive load and maximizing their point value.
How customer interviews and research shaped the MVP.
What we heard from customer research ? (~40 interviews)
Our initial customer interviews revealed common pain points and desires that directly informed our product strategy.
64%
Don't know if redemption is worth it
Primary confidence gap. This is the product thesis.
69%
Want the absolute best value
Perfectionism creates decision paralysis and inaction.
49%
Double-check multiple sources
Deep distrust of existing tools and platforms.
44%
Would book with a single verdict
High conversion intent if trust is established.
38%
Finding award availability is frustrating
Real-time availability is the technical moat.
71%
Points sitting idle for 6+ months
The idle points problem is the core pain.

Opportunity Areas Identified
From our research, four key areas emerged where we could provide significant value to users.
Decision Confidence
Users need one answer, not 47 options. The verdict replaces research.
Real-time Availability
Competitors show stale data. Live seats.aero integration is the moat.
Points Value Clarity
CPP is invisible to most users. Making it legible builds trust instantly.
Speed to Decision
30 minutes of research compressed to 60 seconds. Time is the UX metric.
The Design Principle That Tied It All Together
"The product is the verdict. Every feature either supports delivering one confident answer, or it doesn't ship."
Feature Prioritization Framework
How we decided what to build first
"Basic features were never scored. If a feature's absence makes the product untrustworthy or unusable, no RICE score changes that outcome."
Why Kano + RICE; not either alone
RICE alone fails here
At pre-launch, Reach and Confidence inputs are estimates. The output feels rigorous but is built on guesswork. Worse, RICE can deprioritise a table-stakes feature if it scores high on Effort โ€” a dangerous outcome for a trust-based product.
Kano alone lacks sequence
Kano is brilliant at understanding user emotion but it does not produce a build order. Two Performance features can both be "more is better" without telling you which ships first.
The combination works
Our 40 customer interviews gave us enough signal to classify features into Kano buckets with confidence. Classification does the heavy lifting. RICE only runs on the contested middle bucket where trade-offs actually live.
Based on customer interviews, features were categorized to understand their impact on user satisfaction. Basic features, though not explicitly scored, are foundational to user trust and product viability.
The three-step process
1
Kano classification first
Use interview data to classify every feature as Basic (table stakes), Performance (more is better), or Delighter (unexpected love). Basic features are automatically P0. No score overrides this.
2
RICE only on Performance features
Run RICE scoring exclusively on the contested middle tier. Reach ร— Impact ร— Confidence รท Effort. This is where the real trade-offs between equally important features are resolved.
3
Name the cuts explicitly
Every not-in-scope decision is a product decision. Automated balance sync and native mobile app are not lazy omissions. They are deliberate calls based on risk and proof-of-concept sequencing.
Prioritization Applied
Based on 40 customer interviews ยท Kano + RICE hybrid framework
40 customer interviews revealed four clear patterns: 64% lacked redemption confidence, 71% had points sitting idle for 6+ months, 38% found availability the biggest frustration, and 44% said they would book immediately with a single trusted verdict. These signals directly shaped each prioritization decision below.
Prioritization - RICE Score

RICE formula: Reach ร— Impact ร— Confidence รท Effort.
Reach = estimated users in 6 months. Impact rated 0.25 to 3. Confidence = percentage certainty from interview data. Effort = person-weeks.
AI Model Overview
To power Zoe, Mytravelwallet AI's conversational layer, we rigorously evaluated multiple leading Large Language Models (LLMs). The goal was to select the optimal fit for a high-trust, decision-support system that would seamlessly integrate with our existing architecture.
The role of AI is critical, but specific. It is designed to:
Interpret natural language queries efficiently
Explain deterministic outputs with clarity
Handle ambiguity and edge cases gracefully
This demanded a model that prioritized accuracy, reasoning clarity, and reliability above all else, ensuring our users receive trustworthy financial recommendations.
Large Language Model (LLM): OpenAI GPT-4
  • Purpose:
    To parse user intent from natural language queries (e.g., โ€œcheap business class to Europe next monthโ€) and generate structured inputs for the FareWise decision engine.
  • Model Characteristics:
    Strong reasoning and structured output capabilities, making it effective for translating ambiguous user queries into actionable parameters (routes, dates, preferences). Performs well in multi-step reasoning but requires guardrails to avoid hallucinated numerical outputs.
Large Language Model (LLM): Anthropic Claude 3.5 Sonnet
  • Purpose:
    Acts as the primary AI explanation layer (Zoe) translating deterministic outputs (CPP, savings, verdict) into clear, human-understandable reasoning.
  • Model Characteristics:
    Highly reliable for controlled generation, strong at following instructions and maintaining factual grounding. Performs well when constrained (no calculations), making it ideal for trust-critical explanation workflows in FareWise.
Large Language Model (LLM): Google Gemini 1.5 Pro
  • Purpose:
    Evaluated for handling long-context reasoning, including potential expansion into multi-leg trips, complex itineraries, and historical fare analysis.
  • Model Characteristics:
    Excels at large context windows and multi-modal inputs. Suitable for future roadmap features like predictive pricing and itinerary optimization, though requires additional tuning for structured decision pipelines.
Large Language Model (LLM): xAI Grok-1
  • Purpose:
    Explored for real-time conversational responses and integration with dynamic data streams (e.g., trends, demand signals).
  • Model Characteristics:
    Optimized for real-time interactions and fast responses. Strong in conversational tone but less suitable for precision-critical financial decisioning, where deterministic outputs and strict guardrails are required.
Validating before committing the architecture.
Before committing to full RAG architecture, we ran a structured 4-week proof of concept. The POC confirmed our hypothesis โ€” and revealed a critical finding that changed the entire design: the LLM hallucinated numbers 34% of the time without the deterministic computation layer. That single finding justified every additional week of safety architecture work.
01
Week 1 โ€” Baseline: LLM Alone
Tested Claude 3.5 Sonnet with zero retrieval on 100 pricing questions. Hallucination on numerical claims: 34%. Recommendation accuracy: 61%. Finding: Foundation model alone is not viable for financial recommendations. The model confidently invents averages, percentages, and competitor prices it cannot possibly know.
02
Week 2 โ€” RAG Layer Added
Connected pgvector with 10,000 fare embeddings. Re-ran same 100 questions. Hallucination dropped to 4.2%. Accuracy: 78%. Finding: RAG grounding works. When given correct data, the model largely stops inventing. But 4.2% was still too high for a financial product.
03
Week 3 โ€” Deterministic Computation Layer
Moved all math to Python code. LLM now receives verified outputs only. It no longer calculates โ€” only explains. Hallucination: 0.3%. Accuracy: 94.2%. Finding: The narrator-not-calculator principle eliminated the remaining error. This was the breakthrough that changed the architecture permanently.
04
Week 4 โ€” Prompt Evaluation Loops
Used Claude Code to run 200+ prompt variants against 100 pricing scenarios. Scored each on accuracy, tone, confidence calibration, and appropriate length. Result: Production prompt template selected at composite score 91/100. Latency optimised to 2.1s P95.
"The POC didn't just validate the architecture โ€” it revealed that the narrator-not-calculator principle was the single most important product constraint we would set. Everything in the safety layer reinforces this one decision."
AI Proof of Concept
The AI validation proof of concept (POC) for Mytravelwallet AI successfully demonstrated the feasibility of delivering accurate, real-time travel booking decisions using a hybrid architecture combining deterministic logic and AI explanation.
Key Validation Outcomes
Decision Accuracy
โ‰ฅ95% alignment between system recommendations and expected optimal value based on CPP benchmarks.
User Trust Signal
Users showed significantly higher confidence when recommendations included clear reasoning and a savings breakdown.
Latency Performance
End-to-end verdict delivery achieved <3 seconds, meeting the "instant decision" expectation for users.
Failure Handling
The system gracefully handled missing availability, API timeouts, and incomplete data without breaking the user experience.
What We Proved
AI should not perform financial calculations โ€” a deterministic engine is more reliable for precision.
AI is most valuable for explanation, translating complex data into user-friendly insights, not making raw decisions.
Trust in AI systems stems from transparency and consistency, reinforcing confidence beyond mere intelligence.
Outcome
This validation confirmed the viability of Mytravelwallet AIโ€™s core thesis: a system that clearly separates calculation (code) from communication (AI) can deliver high-trust, high-accuracy decision-making at scale.
It also established a strong foundation for future development, including scaling to more routes and programs, adding predictive intelligence, and increasing conversion through trust-driven UX.
AI Input and AI Output
  • User Search Query (Natural Language):
    User-provided travel intent such as โ€œcheap business class to Europe next monthโ€ or โ€œbest way to use 80K points to NYC,โ€ which is parsed into structured parameters (origin, destination, dates, preferences).
  • Real-Time Flight Pricing Data:
    Live cash fares retrieved from flight APIs to determine baseline ticket prices.
  • Award Availability Data:
    Points-based redemption options pulled from sources like Seats.aero, including airline programs, seat availability, and required miles.
  • User Financial Context:
    Credit card portfolio, available points/miles balances, and transfer partners (e.g., Amex, Chase, Capital One).
  • Historical Pricing & Benchmark Data:
    Precomputed CPP (cents-per-point) benchmarks and fare trends used to evaluate whether a deal is โ€œgood,โ€ โ€œaverage,โ€ or โ€œexpensive.โ€
  • Verdict Recommendation (Core Output):
    A single, actionable decision โ€” Use Points / Pay Cash / Wait based on deterministic evaluation of value.
  • Savings & Value Score:
    Clear quantification of value including:
  • Estimated savings ($)
  • CPP (cents per point)
  • Value rating (Excellent / Good / Poor)
  • Explanation Layer (AI-Generated):
    Natural language explanation generated by the LLM (Zoe), translating calculations into user-friendly reasoning (e.g., โ€œThis redemption gives 1.8 CPP, which is above your typical value good time to use points.โ€)
  • Next-Step Actions:
โ€œBook via airline portalโ€
Data Pipeline
The data pipeline transforms complex travel intent and vast data into clear, actionable financial recommendations, ensuring users always make the smartest travel decisions.
1. User Input & Context Capture
Users provide natural language queries, which are parsed alongside their personal financial context (credit cards, points balances) to create structured input. This converts messy intent into actionable data.
2. Real-Time Data Collection
Cash fares from flight APIs and award availability from sources like Seats.aero are fetched. This creates a unified dataset of all possible travel options, including airline programs and required miles.
3. Data Normalization & Enrichment
Raw data is cleaned and standardized across routes, times, and pricing. Historical benchmarks, CPP baselines, and transfer ratios enrich the data, making all options comparable on a common scale.
4. Deterministic Decision Engine
The core layer computes points value (CPP), savings, and opportunity cost without AI. It applies strict rules to produce a precise, debuggable verdict: Use Points, Pay Cash, or Wait.
5. Confidence & Signal Layer
This layer analyzes price trends, demand, and seat availability changes. It generates a confidence score and urgency indicators (e.g., "Book now") to reduce user hesitation and guide timely decisions.
6. AI Explanation Layer (Zoe)
Zoe, our LLM, takes deterministic outputs (verdict, CPP, savings) and generates clear, human-friendly explanations. Importantly, the AI only explains precomputed results, never performs calculations.
7. Output Generation
The system delivers a single verdict (Use Points / Pay Cash / Wait), quantified savings, a value rating, and Zoe's explanation. It then suggests direct actions like "Book flight" or "Transfer points," converting decision to action instantly.
8. Feedback Loop & Learning
User actions and feedback are captured to continuously improve threshold tuning, UX clarity, and personalization. This iterative process ensures increasing trust and accuracy over time.
FareWise AI is not a prediction system โ€” it is a decision system built on deterministic logic and AI-powered explanation.
Tech Stack
The product operates on a modern, robust tech stack designed for speed, accuracy, and scalability. This architecture ensures every travel decision is informed by deterministic logic and enhanced by intelligent, user-friendly explanations.
UI
Frontend Layer
React ยท Next.js ยท TypeScript ยท Tailwind CSS ยท Vercel
Fast, responsive interface for real-time travel decision-making.
BACKEND
Backend & Orchestration
Node.js ยท API Routes ยท Business Logic Engine
Handles user requests, orchestrates API calls, and manages core decision processing. Orchestrates agent workflows and system-level decision flow.
AI
AI Layer
GPT-4 ยท LLM Explanation Engine
Generates natural language explanations. The AI explains, it does not calculate. Works as part of an agent-driven pipeline, not standalone.
DATA
Data & Storage Layer
PostgreSQL ยท S3-compatible Storage ยท pgvector
Stores user data, search history, verdict logs, CPP benchmarks, and transfer rules.
APIS
External Data APIs
Seats.aero ยท Google Flights
Real-time award availability and cash pricing data inputs.
AGENTS
AI Agent Orchestration Layer
LangGraph ยท Agent Workflow System
Intent Parser ยท Context Builder ยท Prompt Engine ยท Validator ยท Confidence Scorer
Coordinates multi-step AI workflow across retrieval, decision engine, and explanation layer. Ensures structured, reliable, and stateful execution of user queries.
Deterministic logic ensures accuracy. Agentic AI orchestrates workflow. LLM enhances clarity.
System Architecture
User Stories & Acceptance Criteria
Category
User Story
Acceptance Criteria
Onboarding
As a new user, I want to sign up using my Google account so that I can get started without creating a new password.
Google OAuth sign-in must complete in under 3 taps. User is taken directly to portfolio setup on first login. Session token persists for 30 days.
Onboarding
As a new user, I want to run my first fare search without creating an account so that I can see a verdict before committing to sign up.
Anonymous first search allowed. Full verdict displayed. CTA to save or run another search is gated behind sign-up prompt. No personal data stored during anonymous session.
Portfolio Management
As a user, I want to add my credit cards point balances so that Mytravelwaller.ai can give me personalised verdicts based on what I actually hold.
User can add up to 10 cards. Card-to-programme mapping is automatic, no manual programme selection. Balance field accepts integers only. Changes reflected in next search immediately.
Portfolio Management
As a user, I want to edit my point balances after each trip so that my verdicts remain accurate over time.
Balance update saves immediately on confirmation. Updated balance reflected in next verdict without requiring a re-login or page refresh.
Search
As a user, I want to search for flights using a structured form so that I can specify my origin, destination, dates, cabin class, and number of travellers
All five fields required before search can submit. Airport autocomplete returns results within 300ms. Cabin options: Economy, Business, First. Minimum 1 traveller, maximum 9.
Search
As a user, I want to type my trip in natural language to Zoe so that I do not have to fill out a form.
Zoe correctly extracts origin, destination, cabin, dates, and programme from natural language input with โ‰ฅ90% accuracy on well-formed queries. If parsing fails, Zoe falls back to structured form with fields pre-populated where possible.
Verdict Engine
As a user, I want to receive a single clear recommendation โ€” Use Points, Pay Cash, or Wait , so that I do not have to evaluate multiple options myself.
Every search returns exactly one recommendation. No lists of options. Recommendation includes: value rating (Excellent/Good/Wait), points required, programme name, transfer path, cash price, savings amount, and CPP value.
Verdict Engine
As a user, I want the booking CTA to take me directly to the airline's award booking page so that I can complete my booking without having to search again
CTA deep-links to the specific airline booking page for the recommended programme. Link opens in new tab. Link tested weekly against each supported airline's URL structure. CTA not shown if no award seats are available.
AI Assistant Zoe
As a user, I want Zoe to explain my verdict in plain language so that I understand why the recommendation was made without needing to know what CPP means.
Zoe's explanation must reference the specific programme, the points cost, the savings amount, and the transfer path using only data provided by the verdict engine. Zoe must not generate or calculate any number independently.
AI Assistant Zoe
As a user, I want the product to still work and give me a verdict even if Zoe is unavailable so that I am not blocked from getting an answer.
If ChatGPT-4 API is unavailable or times out, the verdict engine result is displayed in template format. User still sees: rating, points cost, savings amount, transfer path, and CTA. Zoe's conversational explanation is replaced with a structured display, not an error message.
Feedback
As a user, I want to give a thumbs up or down on every verdict so that Mytravelwallet.ai can improve its recommendations over time.
Feedback prompt appears after every verdict โ€” non-blocking, dismissable. Thumbs-up and thumbs-down both record verdict ID, timestamp, and user ID. Feedback stored in analytics backend.
Feedback
As a user, I want to explain why I gave a thumbs-down so that the team can understand what went wrong.
Thumbs-down expands an optional free-text field (max 280 characters). Submission of free text is optional โ€” thumbs-down can be submitted without it. Free text stored with the feedback record.
Analytics
As a product team, we want to track every step of the user funnel from landing to CTA click so that we can measure the success criteria at 90 days.
All 7 funnel events instrumented at launch: signup, portfolio setup complete, first search submitted, first verdict received, CTA clicked, feedback submitted, return visit. Events include timestamp, user ID, session ID, and relevant payload.
Trust & Safety
As a user, I want to see a clear disclaimer on every verdict reminding me to verify availability before transferring points so that I understand the risk of irreversible transfers.
Universal disclaimer displayed on every verdict card regardless of rating. Disclaimer text: 'Always verify availability and pricing directly on the airline's website before transferring points or booking. Award seats can be claimed by other travellers at any time.' Cannot be hidden, minimized, or skipped.
Trust & Safety
As a user, I want to know the source of every number in my verdict so that I can trust it was calculated from real data and not invented by the AI.
Every numerical claim in the verdict like CPP, savings, points cost is traceable to a deterministic engine calculation on live API data. Zoe's explanation must not introduce any number not present in the engine's output object. Output validator checks this on every response.
AI Guardrails
As a product team, we want the LLM to be structurally prevented from calculating numbers so that hallucination on financial data is architecturally impossible.
System prompt hard constraint present on every ChatGPT-4 API call: 'All numeric data comes from the engine context provided. Never generate, estimate, or calculate numbers yourself.' Output validator confirms every number in LLM response matches source context before delivery.
AI Guardrails
As a product team, we want every Zoe response to pass a confidence threshold before being shown to the user so that low-quality AI outputs never reach the user.
Confidence scorer evaluates every ChatGPT-4 response. Threshold: โ‰ฅ0.92. Responses scoring below threshold trigger one regeneration attempt. If second attempt also fails threshold: template-based verdict displayed instead. Confidence score logged for every response.
AI Guardrails
As a product team, we want the Award Knowledge Base to be updatable within 24 hours of any programme devaluation so that our recommendations are never based on outdated award chart data.
KB update process requires no code deploy. Ops team can update transfer partner rules, award charts, floor CPP values, and programme T&Cs via config update. Changes reflected in all Zoe calls within 15 minutes of update. Update audit log maintained.
Potential Risks and Mitigations
Identifying and proactively addressing potential risks is crucial for the successful development and long-term sustainability of the product.
By anticipating these challenges and implementing robust mitigation strategies, we aim to build a resilient and successful product.
AI Product Scalability
MyTravelWallet is built with AI scalability at its core, ensuring that Zoe and the verdict engine continue to perform accurately as our user base, loyalty programme coverage, and search volume grow.
We have architected a clean separation between the deterministic verdict engine and the Zoe LLM layer, so AI performance scales independently of product complexity. As users generate more searches and feedback signals, we fine-tune Zoe on real verdict interactions improving natural language accuracy without touching the core engine logic that guarantees hallucination-free numbers.
To prevent AI drift and maintain recommendation integrity, we employ Evidently AI for continuous model monitoring, a confidence gate of โ‰ฅ0.92 on every Zoe response, and a human-in-the-loop feedback loop that flags any Good or Excellent verdict receiving a thumbs-down within one hour. This architecture ensures. It Mdelivers fast, trusted, and financially accurate verdicts whether serving 1000 users at launch or 500,000 users at scale.
Making Zoe AI Assistant trustworthy at scale.
Hallucination in a financial recommendation product is not a UX problem โ€” it is a trust crisis. A single fabricated price can cause a wrong booking decision, destroy user confidence permanently, and spread publicly. We designed seven independent guardrail layers so no single failure can reach the user with an invented number.
01
RAG Grounding
The LLM never reasons from memory alone. Before every response, the system retrieves current, verified fare data from pgvector and passes it as explicit context. The primary and most powerful layer.
02
Deterministic Computation
All percentages, savings amounts, and comparisons are computed in Python code. The LLM is the narrator โ€” it explains numbers, it never derives them. The narrator-not-calculator principle.
03
System Prompt Hard Constraint
"Use ONLY the data provided below. Never calculate or generate numbers yourself." Tested against 200+ adversarial prompt variants to verify it cannot be easily circumvented.
04
Output Validation Gate
Before any response reaches the user, the output validator cross-references every numerical claim against the source data context. Mismatch โ†’ reject and regenerate. Maximum 3 regeneration attempts.
05
Confidence Threshold โ‰ฅ 0.92
A confidence scorer using perplexity and entailment scoring evaluates each output. Below 0.92 โ†’ regenerate. After 3 attempts โ†’ rule-based fallback delivered transparently.
06
Source Transparency
Every Zoe response displays: "Based on 90-day historical data ยท Updated [N] min ago." Users can verify the currency of the underlying data. An accountability promise.
07
Continuous Evaluation Loop
User accept/reject signals feed a nightly evaluation pipeline. Prompt variants are A/B tested bi-weekly. Hallucination audits run on weekly production samples. Guardrails improve over time โ€” not just at launch.
Hallucination is not a bug you fix , it is a risk you architect around. Defense in depth, not a single guardrail. The LLM is the narrator. It never derives. It only explains what it was given.
MVP
Our Minimum Viable Product (MVP) focuses on delivering core value: instantly providing users with accurate flight redemption verdicts. This initial release prioritizes critical features to validate market demand and gather essential user feedback.
Home Page
Zoe AI Assistant
Error uploading image.
Final Verdict with explanation and CTA
Profile Management
Wallet Confirguration
The MVP is designed for rapid iteration, allowing us to quickly adapt based on real-world usage data and user insights. Future enhancements will build upon this validated foundation.
Smital Kamdi
AI Product Manager ยท San Francisco, CA
Send an Email