peer-review
Verifiedby davila7 · Updated Apr 16, 2026
Systematic peer review toolkit. Evaluate methodology, statistics, design, reproducibility, ethics, figure integrity, reporting standards, for manuscript and grant review across disciplines.
See It In Action
I'll start by exploring the project files to find the manuscript, then run a comprehensive peer review.
Now let me run the systematic peer review toolkit on this manuscript.
Let me check the skill's reference materials for reporting standards relevant to this manuscript type.
Now I have the full picture: the manuscript as submitted, the supplementary materials, the R1 reviewer comments, and what the authors have/haven't addressed per their cover letter. Let me produce the review.
Peer Review Report: SepsisNet Manuscript (CCM-2026-0342, R1 Revision)
Manuscript: Early Prediction of Sepsis Onset in Emergency Department Patients Using a Deep Learning Model Trained on Electronic Health Records Journal: Critical Care Medicine Review type: Pre-resubmission audit (comprehensive)
Summary Statement
This single-center retrospective study develops SepsisNet, a GRU-based deep learning model for 6-hour sepsis prediction in ED patients using 142 EHR features, reporting AUROC 0.891 vs. qSOFA (0.721) and SIRS (0.684). The study addresses a clinically important problem and the temporal validation design is appropriate. However, the manuscript in its current state has critical unresolved issues from R1 that will likely result in rejection if resubmitted tonight without being addressed. Below I distinguish between (A) unfinished R1 items the reviewers will immediately flag, (B) issues in the current text that the revision introduced or failed to fix, and (C) new concerns not raised in R1 that a careful re-review will catch.
Key strengths:
- Clinically relevant question with a well-defined ED-specific cohort (most sepsis ML work is ICU-focused)
- Appropriate chronological train/test split avoiding temporal leakage in the split itself
- Honest reporting of calibration limitations (H-L p=0.02)
Key weaknesses (blocking publication):
- 6 of 12 R1 action items are incomplete per the cover letter -- reviewers will notice immediately
- Informative missingness handled with naive median imputation (procalcitonin 68% missing) without any sensitivity analysis
- Race included as a model feature without justification or fairness analysis
- No decision curve analysis, NRI, or clinical utility assessment beyond AUROC/PPV
- Overclaimed clinical significance throughout the Discussion
Recommendation: Major revision. The manuscript cannot go out tonight in its current state. Below I prioritize what is fixable in hours vs. what needs more time.
SECTION I: R1 Items Still Unresolved (Highest Priority)
These are items the cover letter itself flags as incomplete. Reviewers will check these first.
1. Missing data sensitivity analysis (Reviewer 2, Major #2) -- STATUS: "in progress"
Why this is blocking: Median imputation with 68% missingness on procalcitonin and 54% on D-dimer is the single biggest methodological vulnerability. The missingness is informative -- procalcitonin is ordered when clinicians suspect sepsis, so its absence signals low clinical suspicion. Median imputation destroys this signal and introduces bias toward the null for non-ordered tests.
What to do tonight (minimum viable fix):
- Add a missingness indicator for each lab feature (binary: was it ordered?) and re-run the model. Report whether AUROCs change. This is computationally fast and directly addresses the informative missingness concern.
- Add a paragraph in Methods 2.4 stating: "To assess sensitivity to imputation strategy, we repeated the primary analysis (a) including binary missingness indicators as additional features, and (b) restricting to the subset of patients with complete data on the top 5 laboratory features. Results were qualitatively similar [or: differed as follows...]."
- If you have time, run multiple imputation (MICE) as a third arm. If not, acknowledge this as a limitation and commit to it in the response letter.
2. Decision curve analysis (Reviewer 1, Major #5) -- STATUS: "not yet started"
Why this is blocking: The editor specifically listed this in their summary. AUROC alone does not demonstrate clinical utility -- DCA shows the net benefit across threshold probabilities and is the standard for prediction model papers in CCM.
What to do tonight:
- Use the
dcurvesPython package orrmdain R. DCA for SepsisNet vs. XGBoost vs. qSOFA vs. treat-all vs. treat-none. This is ~30 lines of code and takes minutes to run once you have predicted probabilities. - Add as Figure 3 (or Supplementary Figure). Report the threshold range where SepsisNet provides net benefit.
- Also report NRI (net reclassification improvement) at the 85% sensitivity operating point: how many patients are correctly reclassified by SepsisNet vs. qSOFA?
3. TRIPOD checklist (Reviewer 2, Major #4) -- STATUS: "partially completed"
What to do tonight:
- Download the TRIPOD+AI checklist (the 2024 update is appropriate for ML prediction models). Complete every item, with page/line references. Submit as Supplementary Table S5.
- Key items you're likely missing: flow diagram (see below), full model specification (architecture diagram), calibration metrics beyond H-L, sample size justification.
4. Participant flow diagram -- STATUS: "not yet created"
What to do tonight:
- Create a TRIPOD-compliant flow diagram: starting N -> exclusions at each step (transferred, AMA/left before triage, LOS <2h, missing triage data, etc.) -> final analytic cohort -> train/validation/test split with counts and sepsis rates in each.
- This is a figure you can create in 30 minutes. Add as Figure 1 (shift current figures).
5. Feature importance method (Reviewer 3, Major #4) -- STATUS: "not yet addressed"
What to do tonight:
- State explicitly in Methods what method generated the "relative importance" values in Section 3.4. If it was attention weights: say so, but note the well-documented limitations of attention as explanation (Jain & Wallace, 2019). If permutation importance: state this. If SHAP: state this and specify KernelSHAP vs. DeepSHAP.
- If you used attention weights, add a supplementary analysis with SHAP or permutation importance as a robustness check. The "relative importance" values summing to 0.73 suggests a normalized attention-based approach, which reviewers will flag as unreliable for causal interpretation.
6. Software versions (Reviewer 3, Minor) -- STATUS: "partially updated"
What to do tonight:
- Add to Methods or Supplementary: Python 3.9.x, PyTorch x.x.x, scikit-learn x.x.x, SciPy x.x.x, pandas x.x.x, numpy x.x.x. Add GPU type (e.g., NVIDIA A100) and training wall time.
- Add random seed(s) used.
- Name BioClinicalBERT explicitly in the main text (currently only in supplement) and cite Alsentzer et al. 2019.
SECTION II: Issues in the Current Manuscript Text
These are problems in the text as it stands that reviewers will catch on re-review.
Major Issue 7: Race as a model feature without justification or fairness analysis
manuscript.md Methods 2.4 lists "demographics" as input features; supplementary_materials.md Table S1 specifies "race (one-hot encoded into 5 categories)." This is a significant concern:
- Including race as a predictor in clinical AI is scientifically and ethically contested (Vyas et al., NEJM 2020; Obermeyer et al., Science 2019). Race is a social construct and a proxy for structural inequities, not a biological variable.
- The subgroup analysis (Table S3) shows AUROC variation by race (White 0.896 vs. Black 0.884 vs. Other 0.878). You need to demonstrate this isn't the model using race to achieve performance.
- Fix: (a) Re-run the model without race and report the AUROC delta. If performance is unchanged, remove race and state you did so. If it drops, discuss why and whether the model is capturing structural healthcare disparities rather than biology. (b) At minimum, add a paragraph in the Discussion addressing algorithmic fairness and the decision to include race.
Major Issue 8: Overclaimed clinical impact in the Discussion
The Discussion contains language that will draw an immediate rejection signal from Reviewer 1 (who already flagged this):
- "SepsisNet could fundamentally transform how EDs approach sepsis screening" (line ~145) -- This claim is not supported by a single-center retrospective study with no prospective or external validation. Replace with measured language: "SepsisNet shows promise for improving early sepsis detection, pending prospective validation."
- "represents a major advance that could save lives" -- Downgrade to: "represents a potentially clinically meaningful lead time."
- "If deployed clinically, SepsisNet could fundamentally transform..." -- Remove entirely or replace with a concrete next-step statement.
Reviewer 1 specifically said "should be toned down" (Major #3). If the reviewers see the same language in the revision, it signals the authors aren't responsive to feedback.
Major Issue 9: Circularity in the sepsis onset / clinical recognition definitions
The sepsis onset time is defined as "the earlier of the time of antibiotic administration or blood culture order" (Methods 2.3). The cover letter says you've now defined clinical recognition as "the earliest of: first antibiotic administration, first blood culture order, or first lactate order." These definitions substantially overlap -- the outcome definition and the comparator for lead time share the same anchor events.
This means the "3.2 hours before clinical recognition" claim may partially reflect the lag between the first component of the Sepsis-3 definition being met and the full definition being met, rather than truly early detection.
Fix: Clearly separate these two constructs. Define clinical recognition independently of the Sepsis-3 label -- e.g., time of first clinical note documenting sepsis concern, or time of first sepsis bundle order set activation. Alternatively, acknowledge this overlap as a limitation and temper the lead-time claims.
Major Issue 10: Alert burden not quantified
PPV of 27.4% at 85% sensitivity. For an ED with 85,000 annual visits (~233/day), roughly 8.3% have sepsis. At 85% sensitivity and 78% specificity:
- True positives per day: ~16
- False positives per day: ~47
- Total daily alerts: ~63
That is 63 alerts per day in an ED. This is a critical clinical deployment consideration. Add this calculation to the Discussion and discuss alert fatigue mitigation strategies (e.g., tiered alerting, integration into existing workflows, requiring two consecutive high-probability predictions).
Major Issue 11: Calibration issue acknowledged but not fixed
H-L p=0.02 with overestimation in the 0.3-0.5 range. Reviewer 2 asked for recalibration, Brier scores, and CITL. None of these appear in the manuscript.
Fix tonight: Apply Platt scaling or isotonic regression on the validation set, report pre- and post-calibration Brier scores, CITL, and calibration slope. Update Figure 2 (calibration plot) to show both raw and recalibrated curves.
SECTION III: New Concerns Not Raised in R1
Major Issue 12: Potential data leakage through informative lab ordering
Lactate is the #2 most important feature (importance 0.14) and has 31% missingness. A lactate order in the first 2 hours signals that the clinician already suspects a serious illness. When you median-impute missing lactate values, you're creating a binary signal: "clinician ordered lactate" (real value) vs. "clinician didn't" (median ~1.4). This is a proxy for clinician gestalt, not an independent physiological predictor.
Fix: Report the model's performance on the subset of patients who had lactate drawn vs. not. If the model performs much better on the lactate-drawn subset, the "early detection" may partially be detecting clinician suspicion, not occult sepsis.
Major Issue 13: No early stopping is a red flag
Supplementary Figure S1 description states: "Early stopping was not implemented; all models were trained for the full 50 epochs" despite convergence at epoch ~30-35. Training 15-20 extra epochs past convergence risks overfitting. This is unusual and will concern methodological reviewers.
Fix: Either (a) implement early stopping with patience=10 and re-report results, or (b) justify the choice (e.g., "validation AUROC showed no degradation through epoch 50, confirming no overfitting") with the actual training curves as evidence. You must include Supplementary Figure S1, which is currently listed as "[Figure not included in this draft -- to be added before resubmission]".
Minor Issue 14: Reference [2] (Kumar 2006) is misapplied
The Introduction cites "each hour of delayed antibiotic administration increases mortality by 7.6%." Kumar 2006 studied septic shock patients specifically, not all sepsis. This overestimates the benefit of early detection for the broader sepsis population your model targets.
Fix: Either qualify the citation ("in septic shock patients") or cite a more recent, broader estimate (e.g., Seymour et al., NEJM 2017, which found a more modest effect in broader sepsis).
Minor Issue 15: Thin reference list
10 references is insufficient for a paper positioning itself at the intersection of deep learning, sepsis, and emergency medicine. Missing critical citations:
- Vyas et al. 2020 (NEJM) -- race-based clinical algorithms
- Sendak et al. 2020 -- real-world deployment challenges for sepsis ML
- Wong et al. 2021 -- external validation failures for sepsis models
- Jain & Wallace 2019 -- attention is not explanation
- Alsentzer et al. 2019 -- BioClinicalBERT
- Van Calster et al. 2019 -- calibration for clinical prediction models
- Seymour et al. 2017 -- time to treatment in sepsis (broader population)
Minor Issue 16: No funding or COI statement
CCM requires both. Add them.
Minor Issue 17: Table 1 should include SMDs
Reviewer 1 specifically requested standardized mean differences instead of p-values. All p-values in Table 1 are <0.001, which is uninformative for n=34,218. Add an SMD column.
SECTION IV: Priority Triage for Tonight's Resubmission
Given the deadline, here's what I'd prioritize:
Must-do (rejection likely without these)
| # | Item | Time estimate | Difficulty |
|---|---|---|---|
| 1 | Tone down all overclaimed language in Discussion (Issue 8) | 20 min | Text edits only |
| 2 | Decision curve analysis + NRI (Issue 2) | 1-2 hr | Need pred. probabilities |
| 3 | Missingness indicators sensitivity analysis (Issue 1) | 1-2 hr | Re-run model |
| 4 | Participant flow diagram (Issue 4) | 30 min | Diagram creation |
| 5 | Feature importance method -- state it explicitly (Issue 5) | 15 min | Text edit |
| 6 | Software versions, random seeds, BioClinicalBERT citation (Issue 6) | 15 min | Text edits |
| 7 | TRIPOD checklist completion (Issue 3) | 1 hr | Checklist form |
| 8 | Funding/COI statement (Issue 16) | 5 min | Text addition |
| 9 | SMDs in Table 1 (Issue 17) | 20 min | Recalculate |
Should-do (significantly strengthens the revision)
| # | Item | Time estimate |
|---|---|---|
| 10 | Race feature analysis or removal + fairness discussion (Issue 7) | 1-2 hr |
| 11 | Calibration fix: Platt scaling + Brier scores + CITL (Issue 11) | 1 hr |
| 12 | Alert burden calculation in Discussion (Issue 10) | 20 min |
| 13 | Include Supplementary Figure S1 (training curves) (Issue 13) | 15 min |
| 14 | Clarify sepsis onset vs. clinical recognition overlap (Issue 9) | 30 min |
Can mention in response letter as limitations / future work
| # | Item |
|---|---|
| 15 | Multiple imputation (MICE) as additional sensitivity analysis |
| 16 | External multi-center validation |
| 17 | Lactate ordering as proxy for clinician gestalt (Issue 12) |
| 18 | Early stopping implementation |
Response Letter Guidance
Your current cover letter lists unfinished items openly, which is honest but signals an incomplete revision. For tonight:
- Remove the "Items still being addressed" section. Everything should be addressed or explicitly deferred with rationale.
- Structure as a point-by-point response to every reviewer comment (including minors). Use the format:
- Reviewer comment (quoted)
- Response: What you did, where in the manuscript (page/line), and why
- For items you truly cannot complete: frame as "We have added X as a limitation and committed to Y in ongoing work" rather than "not yet addressed."
- Thank reviewers specifically for comments that improved the paper -- R2's missing data concern and R1's alert fatigue point genuinely strengthen the manuscript.
This review covers methodology, statistics, reproducibility, ethics, reporting standards, and clinical translatability. The manuscript has a solid foundation and a publishable study underneath -- the gap is between the current draft and what CCM's reviewers will demand. The must-do items above are the critical path to acceptance.
Permissions
| Scope | Description |
|---|---|
| filesystem:read | |
| filesystem:write |
SKILL.md
Scientific Critical Evaluation and Peer Review
Overview
Peer review is a systematic process for evaluating scientific manuscripts. Assess methodology, statistics, design, reproducibility, ethics, and reporting standards. Apply this skill for manuscript and grant review across disciplines with constructive, rigorous evaluation.
When to Use This Skill
This skill should be used when:
- Conducting peer review of scientific manuscripts for journals
- Evaluating grant proposals and research applications
- Assessing methodology and experimental design rigor
- Reviewing statistical analyses and reporting standards
- Evaluating reproducibility and data availability
- Checking compliance with reporting guidelines (CONSORT, STROBE, PRISMA)
- Providing constructive feedback on scientific writing
Visual Enhancement with Scientific Schematics
When creating documents with this skill, always consider adding scientific diagrams and schematics to enhance visual communication.
If your document does not already contain schematics or diagrams:
- Use the scientific-schematics skill to generate AI-powered publication-quality diagrams
- Simply describe your desired diagram in natural language
- Nano Banana Pro will automatically generate, review, and refine the schematic
For new documents: Scientific schematics should be generated by default to visually represent key concepts, workflows, architectures, or relationships described in the text.
How to generate schematics:
python scripts/generate_schematic.py "your diagram description" -o figures/output.png
The AI will automatically:
- Create publication-quality images with proper formatting
- Review and refine through multiple iterations
- Ensure accessibility (colorblind-friendly, high contrast)
- Save outputs in the figures/ directory
When to add schematics:
- Peer review workflow diagrams
- Evaluation criteria decision trees
- Review process flowcharts
- Methodology assessment frameworks
- Quality assessment visualizations
- Reporting guidelines compliance diagrams
- Any complex concept that benefits from visualization
For detailed guidance on creating schematics, refer to the scientific-schematics skill documentation.
Peer Review Workflow
Conduct peer review systematically through the following stages, adapting depth and focus based on the manuscript type and discipline.
Stage 1: Initial Assessment
Begin with a high-level evaluation to determine the manuscript's scope, novelty, and overall quality.
Key Questions:
- What is the central research question or hypothesis?
- What are the main findings and conclusions?
- Is the work scientifically sound and significant?
- Is the work appropriate for the intended venue?
- Are there any immediate major flaws that would preclude publication?
Output: Brief summary (2-3 sentences) capturing the manuscript's essence and initial impression.
Stage 2: Detailed Section-by-Section Review
Conduct a thorough evaluation of each manuscript section, documenting specific concerns and strengths.
Abstract and Title
- Accuracy: Does the abstract accurately reflect the study's content and conclusions?
- Clarity: Is the title specific, accurate, and informative?
- Completeness: Are key findings and methods summarized appropriately?
- Accessibility: Is the abstract comprehensible to a broad scientific audience?
Introduction
- Context: Is the background information adequate and current?
- Rationale: Is the research question clearly motivated and justified?
- Novelty: Is the work's originality and significance clearly articulated?
- Literature: Are relevant prior studies appropriately cited?
- Objectives: Are research aims/hypotheses clearly stated?
Methods
- Reproducibility: Can another researcher replicate the study from the description provided?
- Rigor: Are the methods appropriate for addressing the research questions?
- Detail: Are protocols, reagents, equipment, and parameters sufficiently described?
- Ethics: Are ethical approvals, consent, and data handling properly documented?
- Statistics: Are statistical methods appropriate, clearly described, and justified?
- Validation: Are controls, replicates, and validation approaches adequate?
Critical elements to verify:
- Sample sizes and power calculations
- Randomization and blinding procedures
- Inclusion/exclusion criteria
- Data collection protocols
- Computational methods and software versions
- Statistical tests and correction for multiple comparisons
Results
- Presentation: Are results presented logically and clearly?
- Figures/Tables: Are visualizations appropriate, clear, and properly labeled?
- Statistics: Are statistical results properly reported (effect sizes, confidence intervals, p-values)?
- Objectivity: Are results presented without over-interpretation?
- Completeness: Are all relevant results included, including negative results?
- Reproducibility: Are raw data or summary statistics provided?
Common issues to identify:
- Selective reporting of results
- Inappropriate statistical tests
- Missing error bars or measures of variability
- Over-fitting or circular analysis
- Batch effects or confounding variables
- Missing controls or validation experiments
Discussion
- Interpretation: Are conclusions supported by the data?
- Limitations: Are study limitations acknowledged and discussed?
- Context: Are findings placed appropriately within existing literature?
- Speculation: Is speculation clearly distinguished from data-supported conclusions?
- Significance: Are implications and importance clearly articulated?
- Future directions: Are next steps or unanswered questions discussed?
Red flags:
- Overstated conclusions
- Ignoring contradictory evidence
- Causal claims from correlational data
- Inadequate discussion of limitations
- Mechanistic claims without mechanistic evidence
References
- Completeness: Are key relevant papers cited?
- Currency: Are recent important studies included?
- Balance: Are contrary viewpoints appropriately cited?
- Accuracy: Are citations accurate and appropriate?
- Self-citation: Is there excessive or inappropriate self-citation?
Stage 3: Methodological and Statistical Rigor
Evaluate the technical quality and rigor of the research with particular attention to common pitfalls.
Statistical Assessment:
- Are statistical assumptions met (normality, independence, homoscedasticity)?
- Are effect sizes reported alongside p-values?
- Is multiple testing correction applied appropriately?
- Are confidence intervals provided?
- Is sample size justified with power analysis?
- Are parametric vs. non-parametric tests chosen appropriately?
- Are missing data handled properly?
- Are exploratory vs. confirmatory analyses distinguished?
Experimental Design:
- Are controls appropriate and adequate?
- Is replication sufficient (biological and technical)?
- Are potential confounders identified and controlled?
- Is randomization properly implemented?
- Are blinding procedures adequate?
- Is the experimental design optimal for the research question?
Computational/Bioinformatics:
- Are computational methods clearly described and justified?
- Are software versions and parameters documented?
- Is code made available for reproducibility?
- Are algorithms and models validated appropriately?
- Are assumptions of computational methods met?
- Is batch correction applied appropriately?
Stage 4: Reproducibility and Transparency
Assess whether the research meets modern standards for reproducibility and open science.
Data Availability:
- Are raw data deposited in appropriate repositories?
- Are accession numbers provided for public databases?
- Are data sharing restrictions justified (e.g., patient privacy)?
- Are data formats standard and accessible?
Code and Materials:
- Is analysis code made available (GitHub, Zenodo, etc.)?
- Are unique materials available or described sufficiently for recreation?
- Are protocols detailed in sufficient depth?
Reporting Standards:
- Does the manuscript follow discipline-specific reporting guidelines (CONSORT, PRISMA, ARRIVE, MIAME, MINSEQE, etc.)?
- See
references/reporting_standards.mdfor common guidelines - Are all elements of the appropriate checklist addressed?
Stage 5: Figure and Data Presentation
Evaluate the quality, clarity, and integrity of data visualization.
Quality Checks:
- Are figures high resolution and clearly labeled?
- Are axes properly labeled with units?
- Are error bars defined (SD, SEM, CI)?
- Are statistical significance indicators explained?
- Are color schemes appropriate and accessible (colorblind-friendly)?
- Are scale bars included for images?
- Is data visualization appropriate for the data type?
Integrity Checks:
- Are there signs of image manipulation (duplications, splicing)?
- Are Western blots and gels appropriately presented?
- Are representative images truly representative?
- Are all conditions shown (no selective presentation)?
Clarity:
- Can figures stand alone with their legends?
- Is the message of each figure immediately clear?
- Are there redundant figures or panels?
- Would data be better presented as tables or figures?
Stage 6: Ethical Considerations
Verify that the research meets ethical standards and guidelines.
Human Subjects:
- Is IRB/ethics approval documented?
- Is informed consent described?
- Are vulnerable populations appropriately protected?
- Is patient privacy adequately protected?
- Are potential conflicts of interest disclosed?
Animal Research:
- Is IACUC or equivalent approval documented?
- Are procedures humane and justified?
- Are the 3Rs (replacement, reduction, refinement) considered?
- Are euthanasia methods appropriate?
Research Integrity:
- Are there concerns about data fabrication or falsification?
- Is authorship appropriate and justified?
- Are competing interests disclosed?
- Is funding source disclosed?
- Are there concerns about plagiarism or duplicate publication?
Stage 7: Writing Quality and Clarity
Assess the manuscript's clarity, organization, and accessibility.
Structure and Organization:
- Is the manuscript logically organized?
- Do sections flow coherently?
- Are transitions between ideas clear?
- Is the narrative compelling and clear?
Writing Quality:
- Is the language clear, precise, and concise?
- Are jargon and acronyms minimized and defined?
- Is grammar and spelling correct?
- Are sentences unnecessarily complex?
- Is the passive voice overused?
Accessibility:
- Can a non-specialist understand the main findings?
- Are technical terms explained?
- Is the significance clear to a broad audience?
Structuring Peer Review Reports
Organize feedback in a hierarchical structure that prioritizes issues and provides actionable guidance.
Summary Statement
Provide a concise overall assessment (1-2 paragraphs):
- Brief synopsis of the research
- Overall recommendation (accept, minor revisions, major revisions, reject)
- Key strengths (2-3 bullet points)
- Key weaknesses (2-3 bullet points)
- Bottom-line assessment of significance and soundness
Major Comments
List critical issues that significantly impact the manuscript's validity, interpretability, or significance. Number these sequentially for easy reference.
Major comments typically include:
- Fundamental methodological flaws
- Inappropriate statistical analyses
- Unsupported or overstated conclusions
- Missing critical controls or experiments
- Serious reproducibility concerns
- Major gaps in literature coverage
- Ethical concerns
For each major comment:
- Clearly state the issue
- Explain why it's problematic
- Suggest specific solutions or additional experiments
- Indicate if addressing it is essential for publication
Minor Comments
List less critical issues that would improve clarity, completeness, or presentation. Number these sequentially.
Minor comments typically include:
- Unclear figure labels or legends
- Missing methodological details
- Typographical or grammatical errors
- Suggestions for improved data presentation
- Minor statistical reporting issues
- Supplementary analyses that would strengthen conclusions
- Requests for clarification
For each minor comment:
- Identify the specific location (section, paragraph, figure)
- State the issue clearly
- Suggest how to address it
Specific Line-by-Line Comments (Optional)
For manuscripts requiring detailed feedback, provide section-specific or line-by-line comments:
- Reference specific page/line numbers or sections
- Note factual errors, unclear statements, or missing citations
- Suggest specific edits for clarity
Questions for Authors
List specific questions that need clarification:
- Methodological details that are unclear
- Seemingly contradictory results
- Missing information needed to evaluate the work
- Requests for additional data or analyses
Tone and Approach
Maintain a constructive, professional, and collegial tone throughout the review.
Best Practices:
- Be constructive: Frame criticism as opportunities for improvement
- Be specific: Provide concrete examples and actionable suggestions
- Be balanced: Acknowledge strengths as well as weaknesses
- Be respectful: Remember that authors have invested significant effort
- Be objective: Focus on the science, not the scientists
- Be thorough: Don't overlook issues, but prioritize appropriately
- Be clear: Avoid ambiguous or vague criticism
Avoid:
- Personal attacks or dismissive language
- Sarcasm or condescension
- Vague criticism without specific examples
- Requesting unnecessary experiments beyond the scope
- Demanding adherence to personal preferences vs. best practices
- Revealing your identity if reviewing is double-blind
Special Considerations by Manuscript Type
Original Research Articles
- Emphasize rigor, reproducibility, and novelty
- Assess significance and impact
- Verify that conclusions are data-driven
- Check for complete methods and appropriate controls
Reviews and Meta-Analyses
- Evaluate comprehensiveness of literature coverage
- Assess search strategy and inclusion/exclusion criteria
- Verify systematic approach and lack of bias
- Check for critical analysis vs. mere summarization
- For meta-analyses, evaluate statistical approach and heterogeneity
Methods Papers
- Emphasize validation and comparison to existing methods
- Assess reproducibility and availability of protocols/code
- Evaluate improvements over existing approaches
- Check for sufficient detail for implementation
Short Reports/Letters
- Adapt expectations for brevity
- Ensure core findings are still rigorous and significant
- Verify that format is appropriate for findings
Preprints
- Recognize that these have not undergone formal peer review
- May be less polished than journal submissions
- Still apply rigorous standards for scientific validity
- Consider providing constructive feedback to help authors improve before journal submission
Presentations and Slide Decks
⚠️ CRITICAL: For presentations, NEVER read the PDF directly. ALWAYS convert to images first.
When reviewing scientific presentations (PowerPoint, Beamer, slide decks):
Mandatory Image-Based Review Workflow
NEVER attempt to read presentation PDFs directly - this causes buffer overflow errors and doesn't show visual formatting issues.
Required Process:
- Convert PDF to images using Python:
python skills/scientific-slides/scripts/pdf_to_images.py presentation.pdf review/slide --dpi 150 # Creates: review/slide-001.jpg, review/slide-002.jpg, etc. - Read and inspect EACH slide image file sequentially
- Document issues with specific slide numbers
- Provide feedback on visual formatting and content
Print when starting review:
[HH:MM:SS] PEER REVIEW: Presentation detected - converting to images for review
[HH:MM:SS] PDF REVIEW: NEVER reading PDF directly - using image-based inspection
Presentation-Specific Evaluation Criteria
Visual Design and Readability:
- Text is large enough (minimum 18pt, ideally 24pt+ for body text)
- High contrast between text and background (4.5:1 minimum, 7:1 preferred)
- Color scheme is professional and colorblind-accessible
- Consistent visual design across all slides
- White space is adequate (not cramped)
- Fonts are clear and professional
Layout and Formatting (Check EVERY Slide Image):
- No text overflow or truncation at slide edges
- No element overlaps (text over images, overlapping shapes)
- Titles are consistently positioned
- Content is properly aligned
- Bullets and text are not cut off
- Figures fit within slide boundaries
- Captions and labels are visible and readable
Content Quality:
- One main idea per slide (not overloaded)
- Minimal text (3-6 bullets per slide maximum)
- Bullet points are concise (5-7 words each)
- Figures are simplified and clear (not copy-pasted from papers)
- Data visualizations have large, readable labels
- Citations are present and properly formatted
- Results/data slides dominate the presentation (40-50% of content)
Structure and Flow:
- Clear narrative arc (introduction → methods → results → discussion)
- Logical progression between slides
- Slide count appropriate for talk duration (~1 slide per minute)
- Title slide includes authors, affiliation, date
- Introduction cites relevant background literature (3-5 papers)
- Discussion cites comparison papers (3-5 papers)
- Conclusions slide summarizes key findings
- Acknowledgments/funding slide at end
Scientific Content:
- Research question clearly stated
- Methods adequately summarized (not excessive detail)
- Results presented logically with clear visualizations
- Statistical significance indicated appropriately
- Conclusions supported by data shown
- Limitations acknowledged where appropriate
- Future directions or broader impact discussed
Common Presentation Issues to Flag:
Critical Issues (Must Fix):
- Text overflow making content unreadable
- Font sizes too small (<18pt)
- Element overlaps obscuring data
- Insufficient contrast (text hard to read)
- Figures too complex or illegible
- No citations (completely unsupported claims)
- Slide count drastically mismatched to duration
Major Issues (Should Fix):
- Inconsistent design across slides
- Too much text (walls of text, not bullets)
- Poorly simplified figures (axis labels too small)
- Cramped layout with insufficient white space
- Missing key structural elements (no conclusion slide)
- Poor color choices (not colorblind-safe)
- Minimal results content (<30% of slides)
Minor Issues (Suggestions for Improvement):
- Could use more visuals/diagrams
- Some slides slightly text-heavy
- Minor alignment inconsistencies
- Could benefit from more white space
- Additional citations would strengthen claims
- Color scheme could be more modern
Review Report Format for Presentations
Summary Statement:
- Overall impression of presentation quality
- Appropriateness for target audience and duration
- Key strengths (visual design, content, clarity)
- Key weaknesses (formatting issues, content gaps)
- Recommendation (ready to present, minor revisions, major revisions)
Layout and Formatting Issues (By Slide Number):
Slide 3: Text overflow - bullet point 4 extends beyond right margin
Slide 7: Element overlap - figure overlaps with caption text
Slide 12: Font size - axis labels too small to read from distance
Slide 18: Alignment - title not centered
Content and Structure Feedback:
- Adequacy of background context and citations
- Clarity of research question and objectives
- Quality of methods summary
- Effectiveness of results presentation
- Strength of conclusions and implications
Design and Accessibility:
- Overall visual appeal and professionalism
- Color contrast and readability
- Colorblind accessibility
- Consistency across slides
Timing and Scope:
- Whether slide count matches intended duration
- Appropriate level of detail for talk type
- Balance between sections
Example Image-Based Review Process
[14:30:00] PEER REVIEW: Starting review of presentation
[14:30:05] PEER REVIEW: Presentation detected - converting to images
[14:30:10] PDF REVIEW: Running pdf_to_images.py on presentation.pdf
[14:30:15] PDF REVIEW: Converted 25 slides to images in review/ directory
[14:30:20] PDF REVIEW: Inspecting slide 1/25 - title slide
[14:30:25] PDF REVIEW: Inspecting slide 2/25 - introduction
...
[14:35:40] PDF REVIEW: Inspecting slide 25/25 - acknowledgments
[14:35:45] PDF REVIEW: Completed image-based review
[14:35:50] PEER REVIEW: Found 8 layout issues, 3 content issues
[14:35:55] PEER REVIEW: Generating structured feedback by slide number
Remember: For presentations, the visual inspection via images is MANDATORY. Never attempt to read presentation PDFs as text - it will fail and miss all visual formatting issues.
Resources
This skill includes reference materials to support comprehensive peer review:
references/reporting_standards.md
Guidelines for major reporting standards across disciplines (CONSORT, PRISMA, ARRIVE, MIAME, STROBE, etc.) to evaluate completeness of methods and results reporting.
references/common_issues.md
Catalog of frequent methodological and statistical issues encountered in peer review, with guidance on identifying and addressing them.
Final Checklist
Before finalizing the review, verify:
- Summary statement clearly conveys overall assessment
- Major concerns are clearly identified and justified
- Suggested revisions are specific and actionable
- Minor issues are noted but properly categorized
- Statistical methods have been evaluated
- Reproducibility and data availability assessed
- Ethical considerations verified
- Figures and tables evaluated for quality and integrity
- Writing quality assessed
- Tone is constructive and professional throughout
- Review is thorough but proportionate to manuscript scope
- Recommendation is consistent with identified issues
FAQ
What does peer-review do?
Systematic peer review toolkit. Evaluate methodology, statistics, design, reproducibility, ethics, figure integrity, reporting standards, for manuscript and grant review across disciplines.
When should I use peer-review?
Use it when you need a repeatable workflow that produces text response.
What does peer-review output?
In the evaluated run it produced text response.
How do I install or invoke peer-review?
npx skills add https://github.com/davila7/claude-code-templates --skill peer-review
Which agents does peer-review support?
Claude Code
What tools, channels, or permissions does peer-review need?
It uses no extra tools; channels commonly include text; permissions include filesystem:read, filesystem:write.
Is peer-review safe to install?
Static analysis marked this skill as medium risk; review side effects and permissions before enabling it.
How is peer-review different from an MCP or plugin?
A skill packages instructions and workflow conventions; tools, MCP servers, and plugins are dependencies the skill may call during execution.
Does peer-review outperform not using a skill?
About peer-review
When to use peer-review
Reviewing a manuscript submission for methodological rigor and publication readiness. Evaluating a grant proposal's experimental design, feasibility, and statistical approach. Preparing structured reviewer comments on reporting standards, ethics, and reproducibility.
When peer-review is not the right choice
When you need to run experiments, analyze raw datasets, or validate results computationally. When the task is primarily generating scientific figures or schematics rather than reviewing research quality.
What it produces
Produces text response.
Install
npx skills add https://github.com/davila7/claude-code-templates --skill peer-reviewInvoke: Ask Claude Code to use peer-review for the task.