AI

Technology

Building a Computer Vision POC in Hours with Vercel AI SDK and Figma MCP

Esteban Silva

January 22, 2026

"What if we could build an app in days instead of weeks, using AI not just as a feature, but as a full-stack development partner?"

That's the question that sparked Super Ultra Kiosco, a proof of concept (POC) that digitized our company's beloved manual snack corner using the latest AI development tools.

We sat down to document exactly how we used tools like Figma MCP (Model Context Protocol) and Vercel AI SDK to build a working product detection system.

Let's start with context: What exactly is "Super Ultra Kiosco"?

In our office, "El Kiosco" is a small corner with snacks, drinks, and other goodies. The system was manual: grab an item, find the shared spreadsheet, and manually log your debt. It was error-prone, tedious, and ripe for modernization.

The Solution: A mobile-first web app where you simply point your phone at the snacks. AI identifies the products via camera, calculates the total, and confirms your order with one tap. No spreadsheets, no typing.

The tech stack behind this AI-native app

To achieve this, we moved beyond standard web development into an AI-native workflow. Here is the core technology stack:

  • Framework: Next.js with TypeScript.
  • AI Vision & Logic: Vercel AI SDK (powered by Google Gemini Vision) for the product detection feature itself.
  • Design-to-Code: Figma MCP for translating designs into code.

This combination allowed us to treat the AI as a collaborator that could "see" our designs and "understand" our business logic.

How does product detection with Vercel AI SDK work?

The "magic" of the app is its ability to recognize a bag of chips or a soda can instantly. For this, we used the Vercel AI SDK coupled with Google’s Gemini Vision model.

Traditional AI integrations often struggle with unstructured text responses. We solved this using the SDK's generateObject function. This allows you to define a schema (using Zod) to force the AI to return structured, typed JSON data instead of a conversational string.

Here is the actual implementation code:

JavaScript

const result = await generateObject({

  model: google('gemini-2.5-flash'),

  messages: [

    {

      role: 'user',

      content: [

        { type: 'text', text: prompt },

        { type: 'image', image }

      ]

    }

  ],

  schema: AnalysisResultSchema,

  temperature: 0.1

});

The Trade-off: Using the AI SDK provides incredible flexibility to switch model providers (e.g., swapping Gemini for GPT-4o) without rewriting code. However, if you need bleeding-edge features specific to a provider—like accessing a brand new beta model or a very specific configuration—you might find it limited. You gain high availability of providers, but you might have a smaller pool of models available within each provider compared to using their native SDKs.

What about Figma MCP? How did that fit into the workflow?

Figma MCP (Model Context Protocol) creates a bridge between your Figma designs and your AI coding assistant (like Windsurf or Cursor).

In technical words, Figma MCP is a standard that connects AI assistants directly to external data sources. The Figma MCP server exposes your actual design files—component properties, layout tokens, and visual hierarchy—as a resource the AI can "read."

​​Instead of us manually coding CSS from mockups, the workflow looked like this:

  1. We designed the UI in Figma.
  2. Our IDE (via MCP) "connected" to the design file.
  3. We prompted the AI: "Build this product card component based on the 'Mobile Card' frame in Figma."
  4. The AI generated React components that matched our design specs (almost) perfectly.

Key Learning: While Figma MCP helped us move fast, it wasn't magic. We found that we still had to iterate a few times to match the designs 100%. The AI captures the general structure well, but it can fail on specifics like exact paddings, button states, or color nuances. It gets you very close, but the final polish to ensure the implementation matches the design perfectly still requires a developer's eye.

What was the most surprising part of building with these AI tools?

Building Super Ultra Kiosco validated four core hypotheses about the state of AI development:

  • Speed: We moved from concept to working prototype in just 6 hours, a fraction of the time it would normally take. The structured output from Vercel’s SDK removed the need for complex parsing logic.
  • Context is everything: When using AI assistants for development, providing good context (whether through MCP or clear instructions) dramatically improves results. With AI SDK giving us structured outputs and Figma MCP ensuring design fidelity, we spent less time on boilerplate and debugging, and more time on the actual product experience.
  • Prompt engineering is still an art: Even with structured outputs, crafting the right prompt for product detection took iteration.
  • Image quality matters: The AI detection works best with good lighting and clear photos. We added image compression to balance quality vs. API costs.

Q&A: Common questions on AI-assisted development

Is Figma MCP ready for production apps? It is excellent for rapid prototyping and setting up component libraries. For complex, custom animations or highly specific accessibility requirements, human oversight is still mandatory.

Why use Vercel AI SDK instead of the official Google SDK? For a POC or a multi-model application, Vercel AI SDK offers a unified API that saves significant time. If your app relies 100% on unique, deep features of a single model (like Gemini's 1M context window specific caching), the native SDK might be better.

What is the cost implication of using Vision models for this? Vision models are more expensive than text models. We implemented client-side image compression before sending requests to the API to balance performance and cost.

Building something similar? 

We'd love to hear about your experience with AI development tools.

LET’S TALK

Esteban Silva

Full-Stack Developer