Anthropic to introduce AI Fluency scorecard in Claude

Anthropic appears to be turning its February research project into a consumer-facing product. References to a new AI Fluency surface have been spotted inside Claude’s settings, where users will be able to open a dedicated screen and ask Claude to generate a personal AI fluency scorecard. The system is designed to scan a user’s activity across Chat, Cowork, and Claude Code sessions, score each session against a defined set of behavioral indicators, and produce a structured report once analysis completes, viewable and managed directly from the settings panel.

New research: The AI Fluency Index.

We tracked 11 behaviors across thousands of https://t.co/RxKnLNNcNR conversations—for example, how often people iterate and refine their work with Claude—to measure how well people collaborate with AI.

Read more: https://t.co/g65nGQFmjG
— Anthropic (@AnthropicAI) February 23, 2026

The scorecard evaluates eleven observable behaviors grouped around competencies that map closely to the 4D AI Fluency Framework Anthropic built with academics Rick Dakan and Joseph Feller. The themes covered include setting the goal and approach, framing the conversation, and applying quality control, broadly the delegation, description, and discernment pillars of that framework. Early signals suggest the result is presented as a fraction, for example, 7.5 out of 11, alongside guidance on which areas a user might strengthen, giving newcomers a concrete sense of where their habits with Claude are paying off and where they aren’t.

This is the logical next step after the AI Fluency Index Anthropic published in February 2026, which analyzed around 9,830 anonymized Claude conversations to baseline how people collaborate with AI today. That study found iteration and refinement to be the strongest predictor of good AI use, while polished outputs like artifacts and code tended to lower critical checking. Bringing the same scoring system into the product turns a research finding into a personal feedback loop, one that nudges users toward the behaviors Anthropic believes lead to safer outcomes.

AI Fluency system prompt

"Please generate a structured AI Fluency scorecard that evaluates how effectively I interact with AI across 11 behavioral indicators, based on the user messages provided below.\n\nThese messages are drawn from 45 conversations across 42 chat, 2 CoWork, and 1 Claude Code sessions. Each message is tagged with its surface — [chat], [cowork], or [cc].\n\nAnalyze the user messages to determine each indicator's status:\n- Use \"demonstrated\" ([+]) for indicators where the user clearly and consistently demonstrates the skill.\n- Use \"partial\" ([~]) for indicators where the user sometimes demonstrates the skill or does so imperfectly.\n- Use \"not-observed\" ([-]) for indicators where there is no evidence of the skill in the provided messages.\n\nFor every indicator marked [+] or [~], include 1-2 evidence quotes taken VERBATIM from the provided messages. Keep quotes under 150 characters each. Do NOT fabricate or invent quotes — every quote must appear exactly as written in the provided messages. If a quote must be shortened to fit the limit, truncate naturally at a word boundary.\n\nFor every indicator (regardless of status), output a Surfaces line listing which surfaces ([chat], [cowork], [cc]) the supporting evidence came from. If status is [-], output \"Surfaces: none\". Fluency looks different across surfaces: coding surfaces ([cowork], [cc]) favor concise delegation; [chat] favors rich description. Weight Description indicators primarily against [chat] messages.\n\nBase your assessment solely on the provided messages. Do not assume skills that are not evidenced.

>>> User chat transcripts are injected here

## The 11 Indicators\n\nA single terse message can genuinely demonstrate multiple indicators at once. \"ELI5\" specifies both an audience (#2: a beginner) and a format (#3: simplified explanation). \"less corporate\" is both tone (#4) and implicit audience (#2). When a message packs multiple signals, credit each indicator it demonstrates — do not force it into only the single most-obvious row. The bar for each is still \"clearly demonstrated\", not \"plausibly related\".\n\n### Delegation\n- 0: Clarifies goals — Does the user state what they want to accomplish before requesting help?\n- 1: Consults on approach — Does the user ASK which approach to take before requesting execution? Interrogative: \"what's the best way to approach this?\", \"how should I structure this?\". The user is seeking a recommendation, not yet committed to a direction. Distinguish from #7: #1 asks which approach, #7 directs how Claude behaves.\n\n### Description\n- 2: Defines audience — Does the user specify who the output is for?\n- 3: Specifies format — Does the user indicate the desired output format (table, list, email, etc.)?\n- 4: Communicates tone — Does the user indicate the voice, tone, or style they want?\n- 5: Builds iteratively — Does the user refine outputs through follow-up rather than accepting the first result?\n- 6: Provides examples — Does the user share examples or references to demonstrate quality expectations?\n- 7: Sets interaction — Does the user TELL Claude how to behave, what role to adopt, or what interaction style to use? Imperative: \"no preamble\", \"devil's advocate this\", \"steelman the other side first\", \"be direct\", \"ask me questions before writing\". The user already knows what they want from Claude's behavior and is directing it — including when the direction is phrased as a terse request (\"devil's advocate this\" is role-setting, not approach-asking).\n\n### Discernment\n- 8: Checks facts — Does the user question or verify factual claims in AI output?\n- 9: Notices reasoning — Does the user push back when the AI's logic seems off? Must name a specific flaw, gap, or contradiction: \"that doesn't follow\", \"you're assuming X\", \"that feels circular\", \"you skipped a step\". Acknowledging or praising the reasoning (\"good reasoning\", \"makes sense\", \"I follow your logic\") does NOT count — that's acceptance, not scrutiny.\n- 10: Recognizes context — Does the user proactively share context the AI could not know?\n\n## Product Feature Usage (deterministic counts from the last 30 days)\n\nprojects: 30 conversations (frequent)\nartifacts: 3 conversations (sometimes)\nweb-search: 27 conversations (frequent)\nresearch: 3 conversations (sometimes)\nconnectors: 4 conversations (sometimes)\nskills: 1 conversation (sometimes)\nmemory: 0 conversations (never used)\nsports: 0 conversations (never used)\nweather: 0 conversations (never used)\nmaps: 0 conversations (never used)\nrecipes: 0 conversations (never used)\nsubagents: 0 conversations (never used)\nmcp-tools: 1 conversation (sometimes)\ncomputer-use: 0 conversations (never used)\n\n## Required Output Format\n\nOutput EXACTLY the text below — three marker-delimited sections with nothing before, after, or between them. Do NOT wrap in a code block. Do NOT add any introductory or closing text.\n\n--- AI Fluency Summary ---\n[A tight 80-110 word summary addressed directly to the user, covering BOTH collaboration behaviors and product-feature usage as one coherent paragraph. Use short, scannable sentences — no dense prose. Lead with the strongest demonstrated behavior, weave in one evidence quote, note which Claude features they rely on most, then close with one behavior and one feature to try next, framed as opportunities. Encouraging and specific, not generic.]\n--- End Summary ---\n\n--- AI Fluency Scorecard ---\nName: User\nRole: General\nConversations: 45\n\n[All 11 indicators in order 0 through 10. Rules:]\n[Indicator line format: <id> [<symbol>] <label>]\n[Status symbols: [+] = demonstrated, [~] = partial, [-] = not-observed]\n[After the indicator line, output one line: Surfaces: <comma-separated list of chat,cowork,cc> or Surfaces: none]\n[For [+] or [~] indicators: follow with 1-2 evidence quotes, each on its own line indented with two spaces and wrapped in double quotes]\n[For [-] indicators: no quote lines]\n--- End Scorecard ---\n\n--- Insights ---\nStrength-Title: [4-6 word headline naming the user's strongest demonstrated behavior]\nStrength-Body: [One sentence, under 110 chars, explaining why this behavior works well for them. Address the user as \"you\".]\nTryNext-Title: [4-6 word headline for one skill to build next, framed as an action]\nTryNext-Body: [One sentence, under 110 chars, with a concrete starting move. Can include a short example prompt in quotes.]\nFeature-Id: [One id from this list, picked from features the user has NOT used yet, that would complement how they already work: projects, artifacts, web-search, research, connectors, skills, memory, sports, weather, maps, recipes, subagents, mcp-tools, computer-use. If every feature is already used, write the word none.]\n--- End Insights ---",

The feature fits a broader push to position Claude not just as a tool but as a skill people can develop, anchored by the Anthropic Academy, the AI Fluency course series, and partnerships with PayPal, GivingTuesday, and university programs. A timeline for the rollout has not surfaced, and it remains unclear whether the scorecard will launch for all tiers or start with onboarded and enterprise audiences first. Either way, it would mark one of the first attempts by a major lab to grade the human side of the conversation rather than the model.