Why Pasting Figma Screenshots into Claude Fails

Here's the workflow that's become default in every design-to-code team right now: export a frame from Figma, paste the PNG into Claude or Cursor, type "build this", and iterate from the hallucinated output. It works just well enough to feel productive. It doesn't work well enough to ship from.

This isn't a model capability problem. It's an input problem. The screenshot is the worst possible representation of a Figma design for an LLM to reason about — and it's almost universally what teams reach for first. The figmascope context bundle is the structured alternative.

The hierarchy is gone

A Figma file is a tree. Frames contain auto-layout groups, which contain component instances, which contain text and fill layers. That tree encodes the layout intent: this row is a flex container, this card is a padded box, these three items are siblings with 16px gaps between them.

A screenshot flattens that tree to a grid of pixels. The LLM sees shapes and colors. It does not see the layout structure — it infers it. And inference is lossy in both directions: the model may reconstruct structure that looks right visually but is wrong semantically (a fixed-width div instead of a flex child, absolute positioning instead of auto-layout), or it may see structural ambiguity and pick one arbitrarily.

You can't tell from a PNG whether a horizontal row of items is implemented with display: flex, CSS Grid, a custom HStack, or three absolutely-positioned divs. They're visually identical. The LLM picks one. The pick changes between runs.

Semantics don't survive rasterization

The LLM can see that a rectangle with rounded corners contains some text and an icon. What it can't see:

Is this a Button component or a custom card?
If it's a button, what variant is it — primary, secondary, ghost?
Is the icon decorative or meaningful?
Does this element have interactive states in the design system, or is it one-off?

Semantics in Figma live in the layer tree: component names, variant properties, node kinds. A Button/Primary/Large component is explicitly typed. In a screenshot, it's a rounded rectangle with a shadow and a label. The model guesses "this is probably a button" correctly most of the time — and then guesses "this is probably the primary variant" based on color, which may or may not match your design system's actual naming.

Small mismatches compound. A ghost button rendered as an outlined button. A tooltip rendered as a modal trigger. A disabled state rendered as active. Each of these is one screenshot inference step away from the source of truth.

Spacing systems don't resolve to numbers

Look at a screenshot of a card with padding. What's the padding? You can't tell without measuring pixels, knowing the canvas scale, knowing the export resolution, and doing the math. The LLM does the math badly — it estimates, it rounds, and it has no way to know if your spacing system uses an 8px base grid or a 4px one or something custom.

So it guesses. It generates padding: 12px when the design says 16. It generates gap: 8px when the design says 12. These numbers look plausible in isolation but they're wrong — and if your design system uses spacing tokens like spacing.md or Spacing/400, the LLM doesn't know about them at all. It hardcodes literals that will drift from your system the moment anything changes.

The LLM isn't hallucinating. It's doing exactly what you'd do with only a screenshot: guessing. You're just surprised when the guesses are wrong because you could see the right answer in the Figma file all along.

Token relationships vanish

Your designer set that background to #7F5CFE. In Figma, that hex is bound to a variable: color/brand/primary. That binding is meaningful — it means the color participates in theming, it means dark mode swaps it, it means if the brand color changes you update one variable and every instance updates.

In the screenshot: it's purple. The LLM generates background-color: #7F5CFE. The token relationship is gone. Your codebase now has a hardcoded hex that will never track with your design system. Multiply this by every component in the screen.

The same applies to typography scales, border radii, and shadow definitions. Every value in a well-maintained Figma file is potentially a named token. Every value in a screenshot is just a number.

Component reuse is invisible

A well-composed screen reuses components. The four product cards are four instances of the same ProductCard component. The avatar in the nav and the avatar in the comment thread are both instances of Avatar/Medium. This matters for code: you want one React component, not four hand-rolled variations that will diverge.

From a screenshot, the LLM sees four visually similar rectangles. It may generate one reusable component — or it may generate four nearly-identical blocks of JSX because it didn't notice they were the same. There's no signal in the image to tell it which is correct.

The IR that figmascope exports carries componentId on every instance node. The agent knows: these four nodes are all ProductCard. Generate it once, render it four times with different props. That's the output you want. That's the output you can't get from pixels.

String identity is lost

You have a "Continue" button on three different screens. Are those three instances the same string, or did a designer write them independently? In a well-structured Figma file, they reference the same string key. That means one i18n entry, one change propagates everywhere.

In three screenshots: three times the LLM generates a hardcoded string. If you're building an internationalised app, you now have three strings to find and replace instead of one to look up. Small thing. Compounds across a real codebase.

Why the LLM hallucinates: it re-derives structure every time

The model has no memory of previous runs. Every time you paste the same screenshot, it reconstructs the structure from scratch. The reconstruction is probabilistic — which means the same screenshot + same prompt + same model can produce measurably different outputs on different runs. Same design, different code. Different component names, different className patterns, different layout choices.

This is not a model bug. It's the expected behavior of a probabilistic model given insufficient constraints. The screenshot provides insufficient constraints. The model fills the gaps. The gaps are filled differently each time.

You can partially work around this with longer, more detailed prompts — "use Tailwind, use 8px grid, use these component names..." — but then you've manually specified the structure that should have been in the design file all along. You're doing the extraction work the tool should do.

The reproducibility problem

Teams that use screenshots for design-to-code handoff run into the same problem: the output is not reproducible. Two developers, same Figma screenshot, independently prompt Claude — they get different component structures, different className patterns, different nesting decisions. Now you have two codebases that look the same visually but are architecturally inconsistent.

This makes code review harder. It makes refactoring harder. It makes design system compliance auditing impossible. You can't diff "what did the agent generate from this design" if the answer changes every run.

Structured context fixes reproducibility because it fixes the inputs. A deterministic input bundle — the same JSON with the same node IDs, component names, token values, and spatial relationships — will produce much more consistent output across runs, agents, and developers. Not perfectly deterministic: the model is still probabilistic. But the variance drops dramatically when the structure is specified rather than inferred.

What a screenshot gives you vs. what the IR gives you

Take a product card: image, title, subtitle, price, a "Add to cart" button. Here's what each input gives the agent:

Screenshot input: A rectangle with an image at the top, two lines of text, a number, and a button. Colors are inferred. Padding is estimated. Whether this is a component or one-off is unknown. The button variant is inferred from color. The spacing system is unknown.

IR input: Node kind FRAME, name ProductCard, component ID linking to the component definition. Auto-layout with vertical direction, 16px gap, 16px horizontal padding, 12px vertical padding. Child nodes: IMAGE (fills width, fixed height), TEXT with stringRef.key: "product.title" and style typography/heading.sm, TEXT with stringRef.key: "product.subtitle" and style typography/body.md, TEXT with fill color/price, INSTANCE of Button/Primary/Medium. Background fill color/surface.card. Border radius radius/card.

The IR gives the agent a spec. The screenshot gives it a suggestion.

The frame: this is the documentation problem

We solved this exact problem for source code decades ago. You don't give an agent a screenshot of your codebase and ask it to reason about architecture. You give it the code — the structured, parseable, semantically-meaningful representation. The abstract syntax tree, not a picture of the editor.

Figma designs are structured data. They have a well-defined tree structure with typed nodes and named values. The Figma API exposes this structure completely. The only reason the screenshot workflow persists is that extracting the structure and formatting it as context has friction.

Reducing that friction is what figmascope does. You paste the Figma URL, the export runs in your browser, and you get a ZIP with structured context: CONTEXT.md, tokens.json, per-screen IR, component inventory, strings manifest. Everything the agent needs, none of it inferred from pixels.

Keep the screenshots for visual confirmation — the bundle includes 2x PNGs for exactly that. Use the structure for everything else. See it in practice: Cursor workflow, Claude Code workflow, or Aider workflow.