peer-review

Verified

by davila7 · Updated Apr 16, 2026

opus

4.4Kin6.9Kout

$0.393m 4s

Systematic peer review toolkit. Evaluate methodology, statistics, design, reproducibility, ethics, figure integrity, reporting standards, for manuscript and grant review across disciplines.

peer-reviewscientific-writingresearchmethodologystatisticsreproducibilityethics

See It In Action

1 turns·claude

The project codebase is in your current working directory. Explore the files to understand the code before making changes. need a full review on this clinical AI manuscript today before they resubmit tonight — make it concrete and publication-focused

I'll start by exploring the project files to find the manuscript, then run a comprehensive peer review.

Now let me run the systematic peer review toolkit on this manuscript.

Let me check the skill's reference materials for reporting standards relevant to this manuscript type.

Now I have the full picture: the manuscript as submitted, the supplementary materials, the R1 reviewer comments, and what the authors have/haven't addressed per their cover letter. Let me produce the review.

Peer Review Report: SepsisNet Manuscript (CCM-2026-0342, R1 Revision)

Manuscript: Early Prediction of Sepsis Onset in Emergency Department Patients Using a Deep Learning Model Trained on Electronic Health Records Journal: Critical Care Medicine Review type: Pre-resubmission audit (comprehensive)

Summary Statement

This single-center retrospective study develops SepsisNet, a GRU-based deep learning model for 6-hour sepsis prediction in ED patients using 142 EHR features, reporting AUROC 0.891 vs. qSOFA (0.721) and SIRS (0.684). The study addresses a clinically important problem and the temporal validation design is appropriate. However, the manuscript in its current state has critical unresolved issues from R1 that will likely result in rejection if resubmitted tonight without being addressed. Below I distinguish between (A) unfinished R1 items the reviewers will immediately flag, (B) issues in the current text that the revision introduced or failed to fix, and (C) new concerns not raised in R1 that a careful re-review will catch.

Key strengths:

Clinically relevant question with a well-defined ED-specific cohort (most sepsis ML work is ICU-focused)
Appropriate chronological train/test split avoiding temporal leakage in the split itself
Honest reporting of calibration limitations (H-L p=0.02)

Key weaknesses (blocking publication):

6 of 12 R1 action items are incomplete per the cover letter -- reviewers will notice immediately
Informative missingness handled with naive median imputation (procalcitonin 68% missing) without any sensitivity analysis
Race included as a model feature without justification or fairness analysis
No decision curve analysis, NRI, or clinical utility assessment beyond AUROC/PPV
Overclaimed clinical significance throughout the Discussion

Recommendation: Major revision. The manuscript cannot go out tonight in its current state. Below I prioritize what is fixable in hours vs. what needs more time.

SECTION I: R1 Items Still Unresolved (Highest Priority)

These are items the cover letter itself flags as incomplete. Reviewers will check these first.

1. Missing data sensitivity analysis (Reviewer 2, Major #2) -- STATUS: "in progress"

Why this is blocking: Median imputation with 68% missingness on procalcitonin and 54% on D-dimer is the single biggest methodological vulnerability. The missingness is informative -- procalcitonin is ordered when clinicians suspect sepsis, so its absence signals low clinical suspicion. Median imputation destroys this signal and introduces bias toward the null for non-ordered tests.

What to do tonight (minimum viable fix):

Add a missingness indicator for each lab feature (binary: was it ordered?) and re-run the model. Report whether AUROCs change. This is computationally fast and directly addresses the informative missingness concern.
Add a paragraph in Methods 2.4 stating: "To assess sensitivity to imputation strategy, we repeated the primary analysis (a) including binary missingness indicators as additional features, and (b) restricting to the subset of patients with complete data on the top 5 laboratory features. Results were qualitatively similar [or: differed as follows...]."
If you have time, run multiple imputation (MICE) as a third arm. If not, acknowledge this as a limitation and commit to it in the response letter.

2. Decision curve analysis (Reviewer 1, Major #5) -- STATUS: "not yet started"

Why this is blocking: The editor specifically listed this in their summary. AUROC alone does not demonstrate clinical utility -- DCA shows the net benefit across threshold probabilities and is the standard for prediction model papers in CCM.

What to do tonight:

Use the dcurves Python package or rmda in R. DCA for SepsisNet vs. XGBoost vs. qSOFA vs. treat-all vs. treat-none. This is ~30 lines of code and takes minutes to run once you have predicted probabilities.
Add as Figure 3 (or Supplementary Figure). Report the threshold range where SepsisNet provides net benefit.
Also report NRI (net reclassification improvement) at the 85% sensitivity operating point: how many patients are correctly reclassified by SepsisNet vs. qSOFA?

3. TRIPOD checklist (Reviewer 2, Major #4) -- STATUS: "partially completed"

What to do tonight:

Download the TRIPOD+AI checklist (the 2024 update is appropriate for ML prediction models). Complete every item, with page/line references. Submit as Supplementary Table S5.
Key items you're likely missing: flow diagram (see below), full model specification (architecture diagram), calibration metrics beyond H-L, sample size justification.

4. Participant flow diagram -- STATUS: "not yet created"

What to do tonight:

Create a TRIPOD-compliant flow diagram: starting N -> exclusions at each step (transferred, AMA/left before triage, LOS <2h, missing triage data, etc.) -> final analytic cohort -> train/validation/test split with counts and sepsis rates in each.
This is a figure you can create in 30 minutes. Add as Figure 1 (shift current figures).

5. Feature importance method (Reviewer 3, Major #4) -- STATUS: "not yet addressed"

What to do tonight:

State explicitly in Methods what method generated the "relative importance" values in Section 3.4. If it was attention weights: say so, but note the well-documented limitations of attention as explanation (Jain & Wallace, 2019). If permutation importance: state this. If SHAP: state this and specify KernelSHAP vs. DeepSHAP.
If you used attention weights, add a supplementary analysis with SHAP or permutation importance as a robustness check. The "relative importance" values summing to 0.73 suggests a normalized attention-based approach, which reviewers will flag as unreliable for causal interpretation.

6. Software versions (Reviewer 3, Minor) -- STATUS: "partially updated"

What to do tonight:

Add to Methods or Supplementary: Python 3.9.x, PyTorch x.x.x, scikit-learn x.x.x, SciPy x.x.x, pandas x.x.x, numpy x.x.x. Add GPU type (e.g., NVIDIA A100) and training wall time.
Add random seed(s) used.
Name BioClinicalBERT explicitly in the main text (currently only in supplement) and cite Alsentzer et al. 2019.

SECTION II: Issues in the Current Manuscript Text

These are problems in the text as it stands that reviewers will catch on re-review.

Major Issue 7: Race as a model feature without justification or fairness analysis

manuscript.md Methods 2.4 lists "demographics" as input features; supplementary_materials.md Table S1 specifies "race (one-hot encoded into 5 categories)." This is a significant concern:

Including race as a predictor in clinical AI is scientifically and ethically contested (Vyas et al., NEJM 2020; Obermeyer et al., Science 2019). Race is a social construct and a proxy for structural inequities, not a biological variable.
The subgroup analysis (Table S3) shows AUROC variation by race (White 0.896 vs. Black 0.884 vs. Other 0.878). You need to demonstrate this isn't the model using race to achieve performance.
Fix: (a) Re-run the model without race and report the AUROC delta. If performance is unchanged, remove race and state you did so. If it drops, discuss why and whether the model is capturing structural healthcare disparities rather than biology. (b) At minimum, add a paragraph in the Discussion addressing algorithmic fairness and the decision to include race.

Major Issue 8: Overclaimed clinical impact in the Discussion

The Discussion contains language that will draw an immediate rejection signal from Reviewer 1 (who already flagged this):

"SepsisNet could fundamentally transform how EDs approach sepsis screening" (line ~145) -- This claim is not supported by a single-center retrospective study with no prospective or external validation. Replace with measured language: "SepsisNet shows promise for improving early sepsis detection, pending prospective validation."
"represents a major advance that could save lives" -- Downgrade to: "represents a potentially clinically meaningful lead time."
"If deployed clinically, SepsisNet could fundamentally transform..." -- Remove entirely or replace with a concrete next-step statement.

Reviewer 1 specifically said "should be toned down" (Major #3). If the reviewers see the same language in the revision, it signals the authors aren't responsive to feedback.

Major Issue 9: Circularity in the sepsis onset / clinical recognition definitions

The sepsis onset time is defined as "the earlier of the time of antibiotic administration or blood culture order" (Methods 2.3). The cover letter says you've now defined clinical recognition as "the earliest of: first antibiotic administration, first blood culture order, or first lactate order." These definitions substantially overlap -- the outcome definition and the comparator for lead time share the same anchor events.

This means the "3.2 hours before clinical recognition" claim may partially reflect the lag between the first component of the Sepsis-3 definition being met and the full definition being met, rather than truly early detection.

Fix: Clearly separate these two constructs. Define clinical recognition independently of the Sepsis-3 label -- e.g., time of first clinical note documenting sepsis concern, or time of first sepsis bundle order set activation. Alternatively, acknowledge this overlap as a limitation and temper the lead-time claims.

Major Issue 10: Alert burden not quantified

PPV of 27.4% at 85% sensitivity. For an ED with 85,000 annual visits (~233/day), roughly 8.3% have sepsis. At 85% sensitivity and 78% specificity:

True positives per day: ~16
False positives per day: ~47
Total daily alerts: ~63

That is 63 alerts per day in an ED. This is a critical clinical deployment consideration. Add this calculation to the Discussion and discuss alert fatigue mitigation strategies (e.g., tiered alerting, integration into existing workflows, requiring two consecutive high-probability predictions).

Major Issue 11: Calibration issue acknowledged but not fixed

H-L p=0.02 with overestimation in the 0.3-0.5 range. Reviewer 2 asked for recalibration, Brier scores, and CITL. None of these appear in the manuscript.

Fix tonight: Apply Platt scaling or isotonic regression on the validation set, report pre- and post-calibration Brier scores, CITL, and calibration slope. Update Figure 2 (calibration plot) to show both raw and recalibrated curves.

SECTION III: New Concerns Not Raised in R1

Major Issue 12: Potential data leakage through informative lab ordering

Lactate is the #2 most important feature (importance 0.14) and has 31% missingness. A lactate order in the first 2 hours signals that the clinician already suspects a serious illness. When you median-impute missing lactate values, you're creating a binary signal: "clinician ordered lactate" (real value) vs. "clinician didn't" (median ~1.4). This is a proxy for clinician gestalt, not an independent physiological predictor.

Fix: Report the model's performance on the subset of patients who had lactate drawn vs. not. If the model performs much better on the lactate-drawn subset, the "early detection" may partially be detecting clinician suspicion, not occult sepsis.

Major Issue 13: No early stopping is a red flag

Supplementary Figure S1 description states: "Early stopping was not implemented; all models were trained for the full 50 epochs" despite convergence at epoch ~30-35. Training 15-20 extra epochs past convergence risks overfitting. This is unusual and will concern methodological reviewers.

Fix: Either (a) implement early stopping with patience=10 and re-report results, or (b) justify the choice (e.g., "validation AUROC showed no degradation through epoch 50, confirming no overfitting") with the actual training curves as evidence. You must include Supplementary Figure S1, which is currently listed as "[Figure not included in this draft -- to be added before resubmission]".

Minor Issue 14: Reference [2] (Kumar 2006) is misapplied

The Introduction cites "each hour of delayed antibiotic administration increases mortality by 7.6%." Kumar 2006 studied septic shock patients specifically, not all sepsis. This overestimates the benefit of early detection for the broader sepsis population your model targets.

Fix: Either qualify the citation ("in septic shock patients") or cite a more recent, broader estimate (e.g., Seymour et al., NEJM 2017, which found a more modest effect in broader sepsis).

Minor Issue 15: Thin reference list

10 references is insufficient for a paper positioning itself at the intersection of deep learning, sepsis, and emergency medicine. Missing critical citations:

Vyas et al. 2020 (NEJM) -- race-based clinical algorithms
Sendak et al. 2020 -- real-world deployment challenges for sepsis ML
Wong et al. 2021 -- external validation failures for sepsis models
Jain & Wallace 2019 -- attention is not explanation
Alsentzer et al. 2019 -- BioClinicalBERT
Van Calster et al. 2019 -- calibration for clinical prediction models
Seymour et al. 2017 -- time to treatment in sepsis (broader population)

Minor Issue 16: No funding or COI statement

CCM requires both. Add them.

Minor Issue 17: Table 1 should include SMDs

Reviewer 1 specifically requested standardized mean differences instead of p-values. All p-values in Table 1 are <0.001, which is uninformative for n=34,218. Add an SMD column.

SECTION IV: Priority Triage for Tonight's Resubmission

Given the deadline, here's what I'd prioritize:

Must-do (rejection likely without these)

#	Item	Time estimate	Difficulty
1	Tone down all overclaimed language in Discussion (Issue 8)	20 min	Text edits only
2	Decision curve analysis + NRI (Issue 2)	1-2 hr	Need pred. probabilities
3	Missingness indicators sensitivity analysis (Issue 1)	1-2 hr	Re-run model
4	Participant flow diagram (Issue 4)	30 min	Diagram creation
5	Feature importance method -- state it explicitly (Issue 5)	15 min	Text edit
6	Software versions, random seeds, BioClinicalBERT citation (Issue 6)	15 min	Text edits
7	TRIPOD checklist completion (Issue 3)	1 hr	Checklist form
8	Funding/COI statement (Issue 16)	5 min	Text addition
9	SMDs in Table 1 (Issue 17)	20 min	Recalculate

Should-do (significantly strengthens the revision)

#	Item	Time estimate
10	Race feature analysis or removal + fairness discussion (Issue 7)	1-2 hr
11	Calibration fix: Platt scaling + Brier scores + CITL (Issue 11)	1 hr
12	Alert burden calculation in Discussion (Issue 10)	20 min
13	Include Supplementary Figure S1 (training curves) (Issue 13)	15 min
14	Clarify sepsis onset vs. clinical recognition overlap (Issue 9)	30 min

Can mention in response letter as limitations / future work

#	Item
15	Multiple imputation (MICE) as additional sensitivity analysis
16	External multi-center validation
17	Lactate ordering as proxy for clinician gestalt (Issue 12)
18	Early stopping implementation

Response Letter Guidance

Your current cover letter lists unfinished items openly, which is honest but signals an incomplete revision. For tonight:

Remove the "Items still being addressed" section. Everything should be addressed or explicitly deferred with rationale.
Structure as a point-by-point response to every reviewer comment (including minors). Use the format:
- Reviewer comment (quoted)
- Response: What you did, where in the manuscript (page/line), and why
For items you truly cannot complete: frame as "We have added X as a limitation and committed to Y in ongoing work" rather than "not yet addressed."
Thank reviewers specifically for comments that improved the paper -- R2's missing data concern and R1's alert fatigue point genuinely strengthen the manuscript.

This review covers methodology, statistics, reproducibility, ethics, reporting standards, and clinical translatability. The manuscript has a solid foundation and a publishable study underneath -- the gap is between the current draft and what CCM's reviewers will demand. The must-do items above are the critical path to acceptance.

Claude Sonnet 4.6

Permissions

Scope	Description
filesystem:read
filesystem:write

SKILL.md

Scientific Critical Evaluation and Peer Review

Overview

Peer review is a systematic process for evaluating scientific manuscripts. Assess methodology, statistics, design, reproducibility, ethics, and reporting standards. Apply this skill for manuscript and grant review across disciplines with constructive, rigorous evaluation.

When to Use This Skill

This skill should be used when:

Conducting peer review of scientific manuscripts for journals
Evaluating grant proposals and research applications
Assessing methodology and experimental design rigor
Reviewing statistical analyses and reporting standards
Evaluating reproducibility and data availability
Checking compliance with reporting guidelines (CONSORT, STROBE, PRISMA)
Providing constructive feedback on scientific writing

Visual Enhancement with Scientific Schematics

When creating documents with this skill, always consider adding scientific diagrams and schematics to enhance visual communication.

If your document does not already contain schematics or diagrams:

Use the scientific-schematics skill to generate AI-powered publication-quality diagrams
Simply describe your desired diagram in natural language
Nano Banana Pro will automatically generate, review, and refine the schematic

For new documents: Scientific schematics should be generated by default to visually represent key concepts, workflows, architectures, or relationships described in the text.

How to generate schematics:

python scripts/generate_schematic.py "your diagram description" -o figures/output.png

The AI will automatically:

Create publication-quality images with proper formatting
Review and refine through multiple iterations
Ensure accessibility (colorblind-friendly, high contrast)
Save outputs in the figures/ directory

When to add schematics:

Peer review workflow diagrams
Evaluation criteria decision trees
Review process flowcharts
Methodology assessment frameworks
Quality assessment visualizations
Reporting guidelines compliance diagrams
Any complex concept that benefits from visualization

For detailed guidance on creating schematics, refer to the scientific-schematics skill documentation.

Peer Review Workflow

Conduct peer review systematically through the following stages, adapting depth and focus based on the manuscript type and discipline.

Stage 1: Initial Assessment

Begin with a high-level evaluation to determine the manuscript's scope, novelty, and overall quality.

Key Questions:

What is the central research question or hypothesis?
What are the main findings and conclusions?
Is the work scientifically sound and significant?
Is the work appropriate for the intended venue?
Are there any immediate major flaws that would preclude publication?

Output: Brief summary (2-3 sentences) capturing the manuscript's essence and initial impression.

Stage 2: Detailed Section-by-Section Review

Conduct a thorough evaluation of each manuscript section, documenting specific concerns and strengths.

Abstract and Title

Accuracy: Does the abstract accurately reflect the study's content and conclusions?
Clarity: Is the title specific, accurate, and informative?
Completeness: Are key findings and methods summarized appropriately?
Accessibility: Is the abstract comprehensible to a broad scientific audience?

Introduction

Context: Is the background information adequate and current?
Rationale: Is the research question clearly motivated and justified?
Novelty: Is the work's originality and significance clearly articulated?
Literature: Are relevant prior studies appropriately cited?
Objectives: Are research aims/hypotheses clearly stated?

Methods

Reproducibility: Can another researcher replicate the study from the description provided?
Rigor: Are the methods appropriate for addressing the research questions?
Detail: Are protocols, reagents, equipment, and parameters sufficiently described?
Ethics: Are ethical approvals, consent, and data handling properly documented?
Statistics: Are statistical methods appropriate, clearly described, and justified?
Validation: Are controls, replicates, and validation approaches adequate?

Critical elements to verify:

Sample sizes and power calculations
Randomization and blinding procedures
Inclusion/exclusion criteria
Data collection protocols
Computational methods and software versions
Statistical tests and correction for multiple comparisons

Results

Presentation: Are results presented logically and clearly?
Figures/Tables: Are visualizations appropriate, clear, and properly labeled?
Statistics: Are statistical results properly reported (effect sizes, confidence intervals, p-values)?
Objectivity: Are results presented without over-interpretation?
Completeness: Are all relevant results included, including negative results?
Reproducibility: Are raw data or summary statistics provided?

Common issues to identify:

Selective reporting of results
Inappropriate statistical tests
Missing error bars or measures of variability
Over-fitting or circular analysis
Batch effects or confounding variables
Missing controls or validation experiments

Discussion

Interpretation: Are conclusions supported by the data?
Limitations: Are study limitations acknowledged and discussed?
Context: Are findings placed appropriately within existing literature?
Speculation: Is speculation clearly distinguished from data-supported conclusions?
Significance: Are implications and importance clearly articulated?
Future directions: Are next steps or unanswered questions discussed?

Red flags:

Overstated conclusions
Ignoring contradictory evidence
Causal claims from correlational data
Inadequate discussion of limitations
Mechanistic claims without mechanistic evidence

References

Completeness: Are key relevant papers cited?
Currency: Are recent important studies included?
Balance: Are contrary viewpoints appropriately cited?
Accuracy: Are citations accurate and appropriate?
Self-citation: Is there excessive or inappropriate self-citation?

Stage 3: Methodological and Statistical Rigor

Evaluate the technical quality and rigor of the research with particular attention to common pitfalls.

Statistical Assessment:

Are statistical assumptions met (normality, independence, homoscedasticity)?
Are effect sizes reported alongside p-values?
Is multiple testing correction applied appropriately?
Are confidence intervals provided?
Is sample size justified with power analysis?
Are parametric vs. non-parametric tests chosen appropriately?
Are missing data handled properly?
Are exploratory vs. confirmatory analyses distinguished?

Experimental Design:

Are controls appropriate and adequate?
Is replication sufficient (biological and technical)?
Are potential confounders identified and controlled?
Is randomization properly implemented?
Are blinding procedures adequate?
Is the experimental design optimal for the research question?

Computational/Bioinformatics:

Are computational methods clearly described and justified?
Are software versions and parameters documented?
Is code made available for reproducibility?
Are algorithms and models validated appropriately?
Are assumptions of computational methods met?
Is batch correction applied appropriately?

Stage 4: Reproducibility and Transparency

Assess whether the research meets modern standards for reproducibility and open science.

Data Availability:

Are raw data deposited in appropriate repositories?
Are accession numbers provided for public databases?
Are data sharing restrictions justified (e.g., patient privacy)?
Are data formats standard and accessible?

Code and Materials:

Is analysis code made available (GitHub, Zenodo, etc.)?
Are unique materials available or described sufficiently for recreation?
Are protocols detailed in sufficient depth?

Reporting Standards:

Does the manuscript follow discipline-specific reporting guidelines (CONSORT, PRISMA, ARRIVE, MIAME, MINSEQE, etc.)?
See references/reporting_standards.md for common guidelines
Are all elements of the appropriate checklist addressed?

Stage 5: Figure and Data Presentation

Evaluate the quality, clarity, and integrity of data visualization.

Quality Checks:

Are figures high resolution and clearly labeled?
Are axes properly labeled with units?
Are error bars defined (SD, SEM, CI)?
Are statistical significance indicators explained?
Are color schemes appropriate and accessible (colorblind-friendly)?
Are scale bars included for images?
Is data visualization appropriate for the data type?

Integrity Checks:

Are there signs of image manipulation (duplications, splicing)?
Are Western blots and gels appropriately presented?
Are representative images truly representative?
Are all conditions shown (no selective presentation)?

Clarity:

Can figures stand alone with their legends?
Is the message of each figure immediately clear?
Are there redundant figures or panels?
Would data be better presented as tables or figures?

Stage 6: Ethical Considerations

Verify that the research meets ethical standards and guidelines.

Human Subjects:

Is IRB/ethics approval documented?
Is informed consent described?
Are vulnerable populations appropriately protected?
Is patient privacy adequately protected?
Are potential conflicts of interest disclosed?

Animal Research:

Is IACUC or equivalent approval documented?
Are procedures humane and justified?
Are the 3Rs (replacement, reduction, refinement) considered?
Are euthanasia methods appropriate?

Research Integrity:

Are there concerns about data fabrication or falsification?
Is authorship appropriate and justified?
Are competing interests disclosed?
Is funding source disclosed?
Are there concerns about plagiarism or duplicate publication?

Stage 7: Writing Quality and Clarity

Assess the manuscript's clarity, organization, and accessibility.

Structure and Organization:

Is the manuscript logically organized?
Do sections flow coherently?
Are transitions between ideas clear?
Is the narrative compelling and clear?

Writing Quality:

Is the language clear, precise, and concise?
Are jargon and acronyms minimized and defined?
Is grammar and spelling correct?
Are sentences unnecessarily complex?
Is the passive voice overused?

Accessibility:

Can a non-specialist understand the main findings?
Are technical terms explained?
Is the significance clear to a broad audience?

Structuring Peer Review Reports

Organize feedback in a hierarchical structure that prioritizes issues and provides actionable guidance.

Summary Statement

Provide a concise overall assessment (1-2 paragraphs):

Brief synopsis of the research
Overall recommendation (accept, minor revisions, major revisions, reject)
Key strengths (2-3 bullet points)
Key weaknesses (2-3 bullet points)
Bottom-line assessment of significance and soundness

Major Comments

List critical issues that significantly impact the manuscript's validity, interpretability, or significance. Number these sequentially for easy reference.

Major comments typically include:

Fundamental methodological flaws
Inappropriate statistical analyses
Unsupported or overstated conclusions
Missing critical controls or experiments
Serious reproducibility concerns
Major gaps in literature coverage
Ethical concerns

For each major comment:

Clearly state the issue
Explain why it's problematic
Suggest specific solutions or additional experiments
Indicate if addressing it is essential for publication

Minor Comments

List less critical issues that would improve clarity, completeness, or presentation. Number these sequentially.

Minor comments typically include:

Unclear figure labels or legends
Missing methodological details
Typographical or grammatical errors
Suggestions for improved data presentation
Minor statistical reporting issues
Supplementary analyses that would strengthen conclusions
Requests for clarification

For each minor comment:

Identify the specific location (section, paragraph, figure)
State the issue clearly
Suggest how to address it

Specific Line-by-Line Comments (Optional)

For manuscripts requiring detailed feedback, provide section-specific or line-by-line comments:

Reference specific page/line numbers or sections
Note factual errors, unclear statements, or missing citations
Suggest specific edits for clarity

Questions for Authors

List specific questions that need clarification:

Methodological details that are unclear
Seemingly contradictory results
Missing information needed to evaluate the work
Requests for additional data or analyses

Tone and Approach

Maintain a constructive, professional, and collegial tone throughout the review.

Best Practices:

Be constructive: Frame criticism as opportunities for improvement
Be specific: Provide concrete examples and actionable suggestions
Be balanced: Acknowledge strengths as well as weaknesses
Be respectful: Remember that authors have invested significant effort
Be objective: Focus on the science, not the scientists
Be thorough: Don't overlook issues, but prioritize appropriately
Be clear: Avoid ambiguous or vague criticism

Avoid:

Personal attacks or dismissive language
Sarcasm or condescension
Vague criticism without specific examples
Requesting unnecessary experiments beyond the scope
Demanding adherence to personal preferences vs. best practices
Revealing your identity if reviewing is double-blind

Special Considerations by Manuscript Type

Original Research Articles

Emphasize rigor, reproducibility, and novelty
Assess significance and impact
Verify that conclusions are data-driven
Check for complete methods and appropriate controls

Reviews and Meta-Analyses

Evaluate comprehensiveness of literature coverage
Assess search strategy and inclusion/exclusion criteria
Verify systematic approach and lack of bias
Check for critical analysis vs. mere summarization
For meta-analyses, evaluate statistical approach and heterogeneity

Methods Papers

Emphasize validation and comparison to existing methods
Assess reproducibility and availability of protocols/code
Evaluate improvements over existing approaches
Check for sufficient detail for implementation

Short Reports/Letters

Adapt expectations for brevity
Ensure core findings are still rigorous and significant
Verify that format is appropriate for findings

Preprints

Recognize that these have not undergone formal peer review
May be less polished than journal submissions
Still apply rigorous standards for scientific validity
Consider providing constructive feedback to help authors improve before journal submission

Presentations and Slide Decks

⚠️ CRITICAL: For presentations, NEVER read the PDF directly. ALWAYS convert to images first.

When reviewing scientific presentations (PowerPoint, Beamer, slide decks):

Mandatory Image-Based Review Workflow

NEVER attempt to read presentation PDFs directly - this causes buffer overflow errors and doesn't show visual formatting issues.

Required Process:

Convert PDF to images using Python:

python skills/scientific-slides/scripts/pdf_to_images.py presentation.pdf review/slide --dpi 150
# Creates: review/slide-001.jpg, review/slide-002.jpg, etc.

Read and inspect EACH slide image file sequentially
Document issues with specific slide numbers
Provide feedback on visual formatting and content

Print when starting review:

[HH:MM:SS] PEER REVIEW: Presentation detected - converting to images for review
[HH:MM:SS] PDF REVIEW: NEVER reading PDF directly - using image-based inspection

Presentation-Specific Evaluation Criteria

Visual Design and Readability:

Text is large enough (minimum 18pt, ideally 24pt+ for body text)
High contrast between text and background (4.5:1 minimum, 7:1 preferred)
Color scheme is professional and colorblind-accessible
Consistent visual design across all slides
White space is adequate (not cramped)
Fonts are clear and professional

Layout and Formatting (Check EVERY Slide Image):

No text overflow or truncation at slide edges
No element overlaps (text over images, overlapping shapes)
Titles are consistently positioned
Content is properly aligned
Bullets and text are not cut off
Figures fit within slide boundaries
Captions and labels are visible and readable

Content Quality:

One main idea per slide (not overloaded)
Minimal text (3-6 bullets per slide maximum)
Bullet points are concise (5-7 words each)
Figures are simplified and clear (not copy-pasted from papers)
Data visualizations have large, readable labels
Citations are present and properly formatted
Results/data slides dominate the presentation (40-50% of content)

Structure and Flow:

Clear narrative arc (introduction → methods → results → discussion)
Logical progression between slides
Slide count appropriate for talk duration (~1 slide per minute)
Title slide includes authors, affiliation, date
Introduction cites relevant background literature (3-5 papers)
Discussion cites comparison papers (3-5 papers)
Conclusions slide summarizes key findings
Acknowledgments/funding slide at end

Scientific Content:

Research question clearly stated
Methods adequately summarized (not excessive detail)
Results presented logically with clear visualizations
Statistical significance indicated appropriately
Conclusions supported by data shown
Limitations acknowledged where appropriate
Future directions or broader impact discussed

Common Presentation Issues to Flag:

Critical Issues (Must Fix):

Text overflow making content unreadable
Font sizes too small (<18pt)
Element overlaps obscuring data
Insufficient contrast (text hard to read)
Figures too complex or illegible
No citations (completely unsupported claims)
Slide count drastically mismatched to duration

Major Issues (Should Fix):

Inconsistent design across slides
Too much text (walls of text, not bullets)
Poorly simplified figures (axis labels too small)
Cramped layout with insufficient white space
Missing key structural elements (no conclusion slide)
Poor color choices (not colorblind-safe)
Minimal results content (<30% of slides)

Minor Issues (Suggestions for Improvement):

Could use more visuals/diagrams
Some slides slightly text-heavy
Minor alignment inconsistencies
Could benefit from more white space
Additional citations would strengthen claims
Color scheme could be more modern

Review Report Format for Presentations

Summary Statement:

Overall impression of presentation quality
Appropriateness for target audience and duration
Key strengths (visual design, content, clarity)
Key weaknesses (formatting issues, content gaps)
Recommendation (ready to present, minor revisions, major revisions)

Layout and Formatting Issues (By Slide Number):

Slide 3: Text overflow - bullet point 4 extends beyond right margin
Slide 7: Element overlap - figure overlaps with caption text
Slide 12: Font size - axis labels too small to read from distance
Slide 18: Alignment - title not centered

Content and Structure Feedback:

Adequacy of background context and citations
Clarity of research question and objectives
Quality of methods summary
Effectiveness of results presentation
Strength of conclusions and implications

Design and Accessibility:

Overall visual appeal and professionalism
Color contrast and readability
Colorblind accessibility
Consistency across slides

Timing and Scope:

Whether slide count matches intended duration
Appropriate level of detail for talk type
Balance between sections

Example Image-Based Review Process

[14:30:00] PEER REVIEW: Starting review of presentation
[14:30:05] PEER REVIEW: Presentation detected - converting to images
[14:30:10] PDF REVIEW: Running pdf_to_images.py on presentation.pdf
[14:30:15] PDF REVIEW: Converted 25 slides to images in review/ directory
[14:30:20] PDF REVIEW: Inspecting slide 1/25 - title slide
[14:30:25] PDF REVIEW: Inspecting slide 2/25 - introduction
...
[14:35:40] PDF REVIEW: Inspecting slide 25/25 - acknowledgments
[14:35:45] PDF REVIEW: Completed image-based review
[14:35:50] PEER REVIEW: Found 8 layout issues, 3 content issues
[14:35:55] PEER REVIEW: Generating structured feedback by slide number

Remember: For presentations, the visual inspection via images is MANDATORY. Never attempt to read presentation PDFs as text - it will fail and miss all visual formatting issues.

Resources

This skill includes reference materials to support comprehensive peer review:

references/reporting_standards.md

Guidelines for major reporting standards across disciplines (CONSORT, PRISMA, ARRIVE, MIAME, STROBE, etc.) to evaluate completeness of methods and results reporting.

references/common_issues.md

Catalog of frequent methodological and statistical issues encountered in peer review, with guidance on identifying and addressing them.

Final Checklist

Before finalizing the review, verify:

FAQ

What does peer-review do?

Systematic peer review toolkit. Evaluate methodology, statistics, design, reproducibility, ethics, figure integrity, reporting standards, for manuscript and grant review across disciplines.

When should I use peer-review?

Use it when you need a repeatable workflow that produces text response.

What does peer-review output?

In the evaluated run it produced text response.

How do I install or invoke peer-review?

npx skills add https://github.com/davila7/claude-code-templates --skill peer-review

Which agents does peer-review support?

Claude Code

What tools, channels, or permissions does peer-review need?

It uses no extra tools; channels commonly include text; permissions include filesystem:read, filesystem:write.

Is peer-review safe to install?

Static analysis marked this skill as medium risk; review side effects and permissions before enabling it.

How is peer-review different from an MCP or plugin?

A skill packages instructions and workflow conventions; tools, MCP servers, and plugins are dependencies the skill may call during execution.

Does peer-review outperform not using a skill?

About peer-review

When to use peer-review

Reviewing a manuscript submission for methodological rigor and publication readiness. Evaluating a grant proposal's experimental design, feasibility, and statistical approach. Preparing structured reviewer comments on reporting standards, ethics, and reproducibility.

When peer-review is not the right choice

When you need to run experiments, analyze raw datasets, or validate results computationally. When the task is primarily generating scientific figures or schematics rather than reviewing research quality.

What it produces

Produces text response.

Install

npx skills add https://github.com/davila7/claude-code-templates --skill peer-review

Invoke: Ask Claude Code to use peer-review for the task.