C

csv-data-analysis

Verified

by eosphoros-ai · Updated Apr 16, 2026

opus
127in2.4Kout
$0.231m 4s

This skill should be used when users need to analyze CSV or Excel files, understand data patterns, generate statistical summaries, or create data visualizations. Trigger keywords include "analyze CSV", "analyze Excel", "data analysis", "CSV analysis", "Excel analysis", "data statistics", "generate charts", "data visualization", "分析CSV", "分析Excel", "数据分析", "CSV分析", "Excel分析", "数据统计", "生成图表", "数据可视化".

csvexceldata-analysisstatisticsvisualizationreportinginsights

See It In Action

1 turns·claude
The project codebase is in your current working directory. Explore the files to understand the code before making changes. uploaded the Q2 leads export just now — need an English report on why conversions dropped by region before the 4pm CRO meeting

Let me find the uploaded file and explore the data.

Found the file — q2_inbound_leads.tsv. Let me read the data and the analysis skill.

Let me run the analyzer to get the full statistical breakdown.

Here's the report for your 4pm CRO meeting:


Q2 Inbound Leads: Conversion Drop Analysis by Region

Dataset: 40 leads | Apr 2 -- Jun 27, 2026 | Overall conversion rate: 35.0%


1. The headline: Conversions are almost entirely an NA story

RegionLeadsConvertedRateAvg Lead ScoreAvg Days in Pipeline
NA141392.9%83.616.3
EMEA1317.7%49.937.3
APAC1300.0%40.251.2

NA is carrying the number. EMEA converted exactly one deal (a single Organic Search enterprise lead on Jun 1). APAC converted zero out of 13 leads across the entire quarter.


2. Root causes

A. Lead quality entering the pipeline is far lower outside NA.

  • NA leads arrive with an average score of 83.6 (range 73--92, tight distribution).
  • EMEA leads average 49.9, APAC leads average 40.2.
  • Correlation between lead score and conversion is r = 0.85 -- the single strongest predictor in this dataset. Leads scoring below ~70 have essentially no chance of converting.
  • This suggests the paid/content programs feeding EMEA and APAC are generating volume but not qualified demand.

B. Pipeline velocity is 2--3x slower in EMEA/APAC.

  • NA deals close in ~16 days. EMEA takes ~37 days. APAC takes ~51 days.
  • Correlation between days-in-pipeline and conversion is r = -0.86. Deals that linger die.
  • APAC's longest deal sat for 60 days and still did not convert.

C. Channel mix is underperforming in international regions.

  • Across all regions, Organic Search is the best-converting source (50%), followed by LinkedIn Ads and Content Download (37.5% each).
  • Webinar and Partner Referral are the weakest (25% each) -- and these are the channels that dominate the EMEA/APAC lead mix.
  • EMEA's sole conversion came through Organic Search, not the paid channels.

D. The trend is worsening.

  • First-half conversion rate: 40%. Second-half: 30%.
  • Average lead score is declining (62.7 -> 54.4) and pipeline days are increasing (31.9 -> 37.0) over the quarter, suggesting the quality problem is getting worse, not better.

3. Rep-level view

RepRegionLeadsConversionsNotes
Sarah ChenNA87Top performer, fast close cycles
James WrightNA66100% conversion on assigned leads
Lena MuellerEMEA61Only EMEA conversion (Enterprise, Organic Search)
Marco RossiEMEA70Zero conversions despite 7 leads
Aiko TanakaAPAC70Zero conversions
Raj PatelAPAC60Zero conversions, longest avg pipeline

Given that lead scores arriving to EMEA/APAC reps are uniformly low (all below 68), this is more of an upstream lead-quality problem than a rep-performance problem.


4. Recommendations for the CRO meeting

  1. Audit EMEA/APAC demand-gen sources. LinkedIn Ads and Webinars are producing low-score leads in these regions. Either retarget the campaigns or reallocate spend toward Organic Search / Content Download which convert at higher rates.
  2. Set a minimum lead-score threshold (~70) for routing to sales. Leads below this score have a near-zero conversion probability. Route them to nurture sequences instead of burning rep time.
  3. Investigate pipeline stalls in APAC. 51-day average cycle with 0% conversion suggests either a qualification gap or a missing local sales motion (pricing, language, product-market fit).
  4. Prioritize deal-value upside. NA's avg deal value ($67K) is nearly double APAC's ($35K). If budget is constrained, doubling down on NA Organic Search / Enterprise deals gives the fastest return.

40 leads, 10 fields, Q2 2026. Analysis from q2_inbound_leads.tsv.

Claude Sonnet 4.6

Permissions

ScopeDescription
filesystem:read

SKILL.md

Intelligent Deep Data Analysis Tool

The Data Analysis Tool is an AI-powered deep automated data exploration tool built on frontend visualization technologies (ECharts + Tailwind CSS). It rapidly extracts statistical features, data quality metrics, numerical distributions, outlier detection, categorical information, correlations, rankings, and time series trends. The latter half of the report supplements these with anomaly overviews, attribution clues, and summary recommendations, producing highly polished and interactive web-based analysis reports. Supported formats include CSV, Excel (.xlsx/.xls), and TSV.

The report follows a structure of "foundational data analysis in the first half, anomaly detection and attribution enhancement in the second half." Core sections include: Executive Summary, Data Overview & Quality Check, Numerical Distribution Features, Feature Analysis & Structural Analysis, Relationship Analysis & Anomaly Identification, Data Anomaly Overview, Attribution Analysis Module, Analysis Results & Statistical Details, Root Cause Inference / Conclusions / Recommendations.

Core Workflow (Required Reading for LLMs)

As an AI assistant, when a user uploads a CSV or Excel file and requests analysis, you must strictly follow these two steps:

Step 1: Extract Data Features (Execute Script)

Use the execute_skill_script_file tool to run csv_analyzer.py, passing in the data file path (supports .csv, .xlsx, .xls, .tsv formats).

Tool call parameter example:

{
  "skill_name": "csv-data-analysis",
  "script_file_name": "csv_analyzer.py",
  "args": {"input_file": "/path/to/data.csv or /path/to/data.xlsx"}
}

Script return explanation: The script returns a large block of text content containing two parts:

  1. [Statistical Summary]: For you to read and understand the dataset's basic characteristics, distributions, correlations, and categorical composition.
  2. [Marker-wrapped data blocks]: The script output contains marker data blocks in the format ###KEY_START###...###KEY_END###. The backend automatically captures and injects these into the template — you do not need to handle or pass this content.

Step 2: Generate Insights & Display Report (Inject into Template)

Read the "Statistical Summary" obtained in Step 1, and reason about the business significance or patterns behind the data. Then use the html_interpreter tool to load the template and inject data.

Critical Rules (Must Follow):

  1. You must set template_path to csv-data-analysis/templates/report_template.html. The template has built-in complete ECharts rendering JavaScript code and all section titles and footer text. You only need to fill in 9 content placeholders via the data parameter. Never write or modify any JavaScript chart rendering code yourself.

  2. Marker data blocks are automatically injected by the backend — you must not pass them in data. The backend automatically extracts content from ###KEY_START###...###KEY_END### markers in the script output and injects it into the template; in this skill, this is primarily CHART_DATA_JSON.

  3. *_INSIGHTS, EXEC_SUMMARY, and CONCLUSIONS must use HTML formatting (e.g., <p>, <ul>, <li>, <strong>, <ol>) to ensure proper layout. These are deep business insights you write based on the statistical summary.

  4. The output language must match the user's input language. You must also pass the LANG placeholder ("en" or "zh") so that the template's hardcoded section titles, labels, and footer text are displayed in the matching language. Detect language from the user's query: if the user writes in English, set LANG to "en"; if the user writes in Chinese, set LANG to "zh". Default to "zh" when uncertain.

  5. Pass exactly 9 placeholders — no more, no less. Auto-injected marker fields like CHART_DATA_JSON are handled by the backend and should not be passed by you. The template already hardcodes all section titles (Distribution Analysis, Correlation Analysis, etc.), insight box titles ("Insights"), and footer text — you do not need to pass these (the template will automatically translate them based on the LANG placeholder).

  6. Insight content must be substantive. Each insight module should cover 4 layers of information: observation, possible causes, business impact, and action recommendations. Do not merely restate statistical values or write only a few vague conclusions.

  7. Foundational analysis first, attribution as an enhancement module. The first half of the report must focus on analyzing the data features of the CSV itself, including numerical distributions, categorical structures, outliers, correlations, ranking patterns, etc., and should incorporate chart interpretations wherever possible. "Data Anomaly Overview," "Attribution Analysis," and "Root Cause Inference" should appear in the second half as enhancement modules — the entire report must not consist solely of attribution content.

html_interpreter call example:

{
  "template_path": "csv-data-analysis/templates/report_template.html",
  "data": {
    "LANG": "en",
    "REPORT_TITLE": "Sales Dataset Deep Analysis Report",
    "REPORT_SUBTITLE": "Multi-dimensional Data Feature & Business Insight Mining",
    "EXEC_SUMMARY": "<p>This dataset contains 1,000 rows and 5 columns with good data completeness. Key findings include:</p><ul><li><strong>Audience Distribution:</strong> Primarily concentrated in the 25-35 age group...</li></ul>",
    "DISTRIBUTION_INSIGHTS": "<p>The numerical distribution chart reveals that Metric A exhibits a pronounced right-skewed distribution, suggesting...</p>",
    "CORRELATION_INSIGHTS": "<p>The heatmap between variables reveals strong positive correlations, particularly between..., which implies...</p>",
    "CATEGORICAL_INSIGHTS": "<p>Category proportions show that Beijing and Shanghai account for over 50% of the 'City' field.</p>",
    "TIME_SERIES_INSIGHTS": "<p>The time series trend indicates a significant seasonal uptick toward year-end.</p>",
    "CONCLUSIONS": "<p>Based on the comprehensive multi-dimensional analysis, the data exhibits clear structural features and patterns.</p><h3>Recommendations</h3><ul><li>Regularly monitor missing value ratios...</li><li>Focus on high-growth market segments...</li></ul>"
  }
}

Strictly Prohibited:

  • Do NOT pass CHART_DATA_JSON or any auto-injected marker fields in data (handled automatically by the backend)
  • Do NOT add any JavaScript code in data
  • Do NOT omit the template_path parameter (omitting template_path will prevent charts from rendering!)
  • Do NOT return static PNG images — this tool has been fully upgraded to ECharts dynamic frontend rendering
  • Do NOT pass non-existent placeholders (the template only has the following 9 text placeholders + 1 auto-injected CHART_DATA_JSON; other names will be ignored)

Placeholder Reference (9 total, passed by LLM via data)

The placeholders you need to fill in the template are as follows:

PlaceholderTypeRequiredDescription
LANGTextYesReport language: "en" for English, "zh" for Chinese. Determines all section titles, labels, and footer text language. Detect from user's input language; default "zh"
REPORT_TITLETextYesReport title, e.g., "Sales Dataset Deep Analysis Report"
REPORT_SUBTITLETextYesReport subtitle, e.g., "Multi-dimensional Data Feature & Business Insight Mining"
EXEC_SUMMARYHTMLYesExecutive summary: overview of data scale, key findings, and conclusion preview
DISTRIBUTION_INSIGHTSHTMLYesNumerical distribution feature interpretation: skewness, volatility, quantile ranges, dispersion
CORRELATION_INSIGHTSHTMLYesRelationship analysis & anomaly identification interpretation: correlations, linkages, outliers, structural relationships
CATEGORICAL_INSIGHTSHTMLYesFeature analysis & structural analysis interpretation: categorical structure, concentration, rankings, and group characteristics
TIME_SERIES_INSIGHTSHTMLYesSupplementary interpretation for the data anomaly overview section: discuss trends if time columns exist; discuss stratification differences and anomaly patterns if no time columns
CONCLUSIONSHTMLYesRoot cause inference, conclusions & recommendations body; must distinguish between "data evidence" and "reasonable speculation"

Note: csv_analyzer.py includes ###CHART_DATA_JSON_START###...###CHART_DATA_JSON_END### marker data blocks in its output. The backend automatically extracts and injects these into the template — they should not be passed in data. All section titles in the template (e.g., "Distribution Analysis", "Correlation Analysis", "Conclusions & Recommendations"), insight box titles ("Insights"), and footer text are hardcoded in the HTML and are automatically translated based on the LANG placeholder — they do not need to be passed via placeholders.

Why Choose This Tool?

  1. Fast & Lightweight: No more slow Python plotting and bulk PNG generation — only core JSON data is transmitted.
  2. Modern Interactive Layout: Fully integrated with Tailwind CSS responsive layouts and Apache ECharts smooth animated interactions.
  3. Deep Business Insights: By separating machine-driven data extraction from LLM-driven logical reasoning, this tool produces highly valuable data analysis reports.

File Structure

csv-data-analysis/
├── SKILL.md                        # The skill guide you are currently reading
├── scripts/
│   └── csv_analyzer.py             # Python analysis engine (supports CSV/Excel/TSV, lightweight, no graphics dependencies)
└── templates/
    └── report_template.html        # Responsive ECharts report template (with built-in rendering logic and hardcoded titles)

FAQ

What does csv-data-analysis do?

This skill should be used when users need to analyze CSV or Excel files, understand data patterns, generate statistical summaries, or create data visualizations. Trigger keywords include "analyze CSV", "analyze Excel", "data analysis", "CSV analysis", "Excel analysis", "data statistics", "generate charts", "data visualization", "分析CSV", "分析Excel", "数据分析", "CSV分析", "Excel分析", "数据统计", "生成图表", "数据可视化".

When should I use csv-data-analysis?

Use it when you need a repeatable workflow that produces text response.

What does csv-data-analysis output?

In the evaluated run it produced text response.

How do I install or invoke csv-data-analysis?

Ask the agent to use this skill when the task matches its documented workflow.

Which agents does csv-data-analysis support?

Agent support is inferred from the source, but not explicitly declared.

What tools, channels, or permissions does csv-data-analysis need?

It uses no extra tools; channels commonly include text; permissions include filesystem:read.

Is csv-data-analysis safe to install?

Static analysis marked this skill as low risk; review side effects and permissions before enabling it.

How is csv-data-analysis different from an MCP or plugin?

A skill packages instructions and workflow conventions; tools, MCP servers, and plugins are dependencies the skill may call during execution.

Does csv-data-analysis outperform not using a skill?

About csv-data-analysis

When to use csv-data-analysis

When a user uploads a CSV, TSV, or Excel file and wants statistical analysis or pattern discovery. When you need an interactive report summarizing distributions, correlations, categories, anomalies, and trends. When you want business-oriented insights derived from structured tabular data.

When csv-data-analysis is not the right choice

When the task is simple file conversion or spreadsheet editing rather than analytical reporting. When the data source is a live database or external SaaS system instead of uploaded local tabular files.

What it produces

Produces text response.