Figma Screenshots Claude में Paste करना क्यों Fail होता है

यहाँ वह workflow है जो अभी हर design-to-code team में default बन गया है: Figma से एक frame export करें, PNG को Claude या Cursor में paste करें, "build this" type करें, और hallucinated output से iterate करें। यह बस इतना अच्छा काम करता है कि productive लगे। यह इतना अच्छा काम नहीं करता कि ship किया जा सके।

यह model capability problem नहीं है। यह input problem है। Screenshot एक LLM के लिए Figma design का worst possible representation है — और यह लगभग universally वह है जिस तक teams पहले पहुंचती हैं। figmascope context bundle structured alternative है।

Hierarchy चली गई

Figma file एक tree है। Frames auto-layout groups contain करते हैं, जो component instances contain करते हैं, जो text और fill layers contain करते हैं। वह tree layout intent encode करता है: यह row flex container है, यह card padded box है, ये तीन items 16px gaps के साथ siblings हैं।

Screenshot उस tree को pixels के grid पर flatten कर देता है। LLM shapes और colors देखता है। यह layout structure नहीं देखता — यह इसे infer करता है। और inference दोनों directions में lossy है: model ऐसी structure reconstruct कर सकता है जो visually right लगती है लेकिन semantically wrong है (flex child की बजाय fixed-width div, auto-layout की बजाय absolute positioning), या यह structural ambiguity देख सकता है और arbitrarily एक pick कर सकता है।

आप PNG से नहीं बता सकते कि items की horizontal row display: flex, CSS Grid, custom HStack, या तीन absolutely-positioned divs के साथ implement है। वे visually identical हैं। LLM एक pick करता है। Pick runs के बीच बदलती है।

Semantics rasterization में survive नहीं करती

LLM देख सकता है कि rounded corners वाले rectangle में कुछ text और एक icon है। यह क्या नहीं देख सकता:

क्या यह Button component है या custom card?
अगर यह button है, तो variant क्या है — primary, secondary, ghost?
क्या icon decorative है या meaningful?
क्या इस element के design system में interactive states हैं, या यह one-off है?

Figma में Semantics layer tree में रहती हैं: component names, variant properties, node kinds। एक Button/Primary/Large component explicitly typed है। Screenshot में, यह shadow और label वाला rounded rectangle है। Model ज्यादातर समय correctly guess करता है "यह शायद button है" — और फिर color के based पर "यह शायद primary variant है" guess करता है, जो आपके design system के actual naming से match कर भी सकता है और नहीं भी।

Small mismatches compound होते हैं। Outlined button के रूप में rendered ghost button। Modal trigger के रूप में rendered tooltip। Active के रूप में rendered disabled state। इनमें से प्रत्येक source of truth से एक screenshot inference step दूर है।

Spacing systems numbers पर resolve नहीं होते

Padding के साथ card का screenshot देखें। Padding क्या है? आप pixels measure किए बिना, canvas scale जाने बिना, export resolution जाने बिना, और math किए बिना नहीं बता सकते। LLM math badly करता है — यह estimate करता है, round करता है, और यह जानने का कोई तरीका नहीं है कि आपका spacing system 8px base grid use करता है या 4px या कुछ custom।

तो यह guess करता है। यह padding: 12px generate करता है जब design 16 कहता है। यह gap: 8px generate करता है जब design 12 कहता है। ये numbers अकेले में plausible लगते हैं लेकिन गलत हैं — और अगर आपका design system spacing.md या Spacing/400 जैसे spacing tokens use करता है, तो LLM उनके बारे में बिल्कुल नहीं जानता। यह literals hardcode करता है जो आपके system से drift करेंगे जैसे ही कुछ बदलेगा।

LLM hallucinate नहीं कर रहा। यह exactly वही कर रहा है जो आप केवल screenshot के साथ करते: guess करना। आप बस तब surprised होते हैं जब guesses गलत होते हैं क्योंकि आप Figma file में सही जवाब देख सकते थे।

Token relationships vanish हो जाती हैं

आपके designer ने वह background #7F5CFE set किया। Figma में, वह hex एक variable से bound है: color/brand/primary। वह binding meaningful है — इसका मतलब है color theming में participate करती है, इसका मतलब है dark mode इसे swap करता है, इसका मतलब है अगर brand color बदलता है तो आप एक variable update करते हैं और हर instance update होता है।

Screenshot में: यह purple है। LLM background-color: #7F5CFE generate करता है। Token relationship चली गई। आपके codebase में अब एक hardcoded hex है जो आपके design system के साथ कभी track नहीं करेगा। इसे screen के हर component पर multiply करें।

यही typography scales, border radii, और shadow definitions पर apply होता है। Well-maintained Figma file में हर value potentially एक named token है। Screenshot में हर value बस एक number है।

Component reuse invisible है

Well-composed screen components reuse करता है। चार product cards same ProductCard component के चार instances हैं। Nav में avatar और comment thread में avatar दोनों Avatar/Medium के instances हैं। यह code के लिए matter करता है: आप एक React component चाहते हैं, चार hand-rolled variations नहीं जो diverge होंगी।

Screenshot से, LLM चार visually similar rectangles देखता है। यह एक reusable component generate कर सकता है — या यह JSX के चार nearly-identical blocks generate कर सकता है क्योंकि इसने notice नहीं किया कि वे same थे। Image में कोई signal नहीं है जो बताए कि कौन सा correct है।

figmascope जो IR export करता है उसमें हर instance node पर componentId होता है। Agent जानता है: ये चार nodes सब ProductCard हैं। एक बार generate करो, different props के साथ चार बार render करो। यह वह output है जो आप चाहते हैं। यह वह output है जो आप pixels से नहीं पा सकते।

String identity खो जाती है

तीन अलग-अलग screens पर आपका "Continue" button है। क्या वे तीन instances same string हैं, या designer ने उन्हें independently लिखा? Well-structured Figma file में, वे same string key reference करते हैं। इसका मतलब है एक i18n entry, एक change हर जगह propagate होती है।

तीन screenshots में: तीन बार LLM hardcoded string generate करता है। अगर आप internationalised app build कर रहे हैं, तो अब आपके पास एक look up करने की बजाय find और replace करने के लिए तीन strings हैं। Small thing। Real codebase में compounds होती है।

LLM क्यों hallucinate करता है: यह हर बार structure re-derive करता है

Model को previous runs की memory नहीं है। हर बार जब आप same screenshot paste करते हैं, यह scratch से structure reconstruct करता है। Reconstruction probabilistic है — जिसका मतलब है same screenshot + same prompt + same model different runs पर measurably different outputs produce कर सकता है। Same design, different code। Different component names, different className patterns, different layout choices।

यह model bug नहीं है। यह insufficient constraints दिए गए probabilistic model का expected behavior है। Screenshot insufficient constraints provide करता है। Model gaps fill करता है। Gaps हर बार differently fill होते हैं।

आप इसे longer, more detailed prompts से partially work around कर सकते हैं — "Tailwind use करो, 8px grid use करो, ये component names use करो..." — लेकिन तब आपने manually वह structure specify किया है जो design file में होना चाहिए था। आप वह extraction work कर रहे हैं जो tool को करनी चाहिए थी।

Reproducibility problem

Design-to-code handoff के लिए screenshots use करने वाली teams उसी problem में run करती हैं: output reproducible नहीं है। दो developers, same Figma screenshot, independently Claude prompt करते हैं — उन्हें different component structures, different className patterns, different nesting decisions मिलते हैं। अब आपके पास दो codebases हैं जो visually same दिखते हैं लेकिन architecturally inconsistent हैं।

यह code review harder बनाता है। Refactoring harder बनाता है। Design system compliance auditing impossible बनाता है। आप "agent ने इस design से क्या generate किया" diff नहीं कर सकते अगर जवाब हर run बदलता है।

Structured context reproducibility fix करता है क्योंकि यह inputs fix करता है। एक deterministic input bundle — same JSON with same node IDs, component names, token values, और spatial relationships — runs, agents, और developers में बहुत ज्यादा consistent output produce करेगा। Perfectly deterministic नहीं: model अभी भी probabilistic है। लेकिन variance dramatically drop होती है जब structure inferred की बजाय specified होती है।

Screenshot आपको क्या देता है vs. IR क्या देता है

एक product card लें: image, title, subtitle, price, एक "Add to cart" button। यहाँ बताया गया है कि हर input agent को क्या देता है:

Screenshot input: Top पर image के साथ एक rectangle, text की दो lines, एक number, और एक button। Colors inferred हैं। Padding estimated है। यह component है या one-off — unknown है। Button variant color से inferred है। Spacing system unknown है।

IR input: Node kind FRAME, name ProductCard, component definition से linking करने वाला component ID। Vertical direction, 16px gap, 16px horizontal padding, 12px vertical padding के साथ auto-layout। Child nodes: IMAGE (width fills करता है, fixed height), TEXT with stringRef.key: "product.title" और style typography/heading.sm, TEXT with stringRef.key: "product.subtitle" और style typography/body.md, TEXT with fill color/price, Button/Primary/Medium का INSTANCE। Background fill color/surface.card। Border radius radius/card।

IR agent को spec देता है। Screenshot उसे suggestion देता है।

Frame: यह documentation problem है

हमने यह exact problem source code के लिए decades पहले solve किया। आप agent को अपने codebase का screenshot नहीं देते और architecture reason करने के लिए नहीं कहते। आप उसे code देते हैं — structured, parseable, semantically-meaningful representation। Abstract syntax tree, editor की picture नहीं।

Figma designs structured data हैं। उनके पास typed nodes और named values के साथ well-defined tree structure है। Figma API इस structure को completely expose करता है। Screenshot workflow इसलिए persist करता है क्योंकि structure extract करने और context के रूप में format करने में friction है।

उस friction को reduce करना figmascope करता है। आप Figma URL paste करते हैं, export आपके browser में चलता है, और आपको structured context के साथ ZIP मिलता है: CONTEXT.md, tokens.json, per-screen IR, component inventory, strings manifest। Agent को जो चाहिए सब कुछ, pixels से inferred कुछ नहीं।

Screenshots visual confirmation के लिए रखें — bundle में exactly उसके लिए 2x PNGs include हैं। बाकी सब के लिए structure use करें। इसे practice में देखें: Cursor workflow, Claude Code workflow, या Aider workflow।