I

image-generation

Verified

by bytedance · Updated Apr 16, 2026

claude-opus-4-6
767in3.2Kout
$0.161m 57s

Use this skill when the user requests to generate, create, imagine, or visualize images including characters, scenes, products, or any visual content. Supports structured prompts and reference images for guided generation.

image-generationaigcvisual-contentpromptingpython

See It In Action

1 turns·claude
The project codebase is in your current working directory. Explore the files to understand the code before making changes. need the design systems homepage hero concept today — modular UI blocks floating in a calm futuristic workspace, matching our two refs for color/composition. ops is still standing up the shared /mnt runtime so if the usual generate.py path isn't reachable, just use whatever you have locally (the scripts/ copy next to the workspace or any image model you can hit via OPENAI_API_KEY / FAL_KEY) — give me the 16:9 render plus the prompt json either way

Now I have the full picture. Let me study the references and build the prompt:

  • Color ref → deep-blue-to-violet gradient, cool and calm dark-mode palette
  • Composition ref → teal-to-purple gradient, similar futuristic atmosphere with depth

I'll craft the prompt JSON, then try fal.ai (since we have FAL_KEY), and fall back to the local generate.py if the API isn't reachable.

fal_client is available. Let me create the prompt JSON and generate the image.

Generation succeeded. Let me verify the output and view it:

Both deliverables are ready. Here's the summary:


Render delivered → outputs/design-system-hero-16x9.png (1920×1072, 182 KB)

The image nails the brief:

  • Modular UI blocks — translucent cards/panels/grid components floating at staggered depths in an isometric-ish workspace
  • Color match — deep navy-to-blue atmosphere with teal edge-lighting, directly echoing both reference gradients (the blue-violet ref and the teal-purple ref)
  • Calm & futuristic — soft volumetric glow on each block, depth-of-field bokeh on the background elements, no clutter
  • 16:9 composition — wide cinematic framing with a clear focal center and breathing room for headline text overlay

Prompt JSON → outputs/hero-prompt.json — includes the full prompt text, negative prompt, target palette hex codes, style-reference annotations, and generation metadata.

Both files are in outputs/. The image was generated via fal.ai Flux Dev (the FAL_KEY from .env worked); the shared /mnt runtime and generate.py weren't needed. Let me know if you want a variation with more violet/teal shift or different composition emphasis.

Claude Sonnet 4.6

External Tools

ToolType
python3binary

Permissions

ScopeDescription
filesystem:read
filesystem:write
network:outbound
process:spawn

SKILL.md

Image Generation Skill

Overview

This skill generates high-quality images using structured prompts and a Python script. The workflow includes creating JSON-formatted prompts and executing image generation with optional reference images.

Core Capabilities

  • Create structured JSON prompts for AIGC image generation
  • Support multiple reference images for style/composition guidance
  • Generate images through automated Python script execution
  • Handle various image generation scenarios (character design, scenes, products, etc.)

Workflow

Step 1: Understand Requirements

When a user requests image generation, identify:

  • Subject/content: What should be in the image
  • Style preferences: Art style, mood, color palette
  • Technical specs: Aspect ratio, composition, lighting
  • Reference images: Any images to guide generation
  • You don't need to check the folder under /mnt/user-data

Step 2: Create Structured Prompt

Generate a structured JSON file in /mnt/user-data/workspace/ with naming pattern: {descriptive-name}.json

Step 3: Execute Generation

Call the Python script:

python /mnt/skills/public/image-generation/scripts/generate.py \
  --prompt-file /mnt/user-data/workspace/prompt-file.json \
  --reference-images /path/to/ref1.jpg /path/to/ref2.png \
  --output-file /mnt/user-data/outputs/generated-image.jpg
  --aspect-ratio 16:9

Parameters:

  • --prompt-file: Absolute path to JSON prompt file (required)
  • --reference-images: Absolute paths to reference images (optional, space-separated)
  • --output-file: Absolute path to output image file (required)
  • --aspect-ratio: Aspect ratio of the generated image (optional, default: 16:9)

[!NOTE] Do NOT read the python file, just call it with the parameters.

Character Generation Example

User request: "Create a Tokyo street style woman character in 1990s"

Create prompt file: /mnt/user-data/workspace/asian-woman.json

{
  "characters": [{
    "gender": "female",
    "age": "mid-20s",
    "ethnicity": "Japanese",
    "body_type": "slender, elegant",
    "facial_features": "delicate features, expressive eyes, subtle makeup with emphasis on lips, long dark hair partially wet from rain",
    "clothing": "stylish trench coat, designer handbag, high heels, contemporary Tokyo street fashion",
    "accessories": "minimal jewelry, statement earrings, leather handbag",
    "era": "1990s"
  }],
  "negative_prompt": "blurry face, deformed, low quality, overly sharp digital look, oversaturated colors, artificial lighting, studio setting, posed, selfie angle",
  "style": "Leica M11 street photography aesthetic, film-like rendering, natural color palette with slight warmth, bokeh background blur, analog photography feel",
  "composition": "medium shot, rule of thirds, subject slightly off-center, environmental context of Tokyo street visible, shallow depth of field isolating subject",
  "lighting": "neon lights from signs and storefronts, wet pavement reflections, soft ambient city glow, natural street lighting, rim lighting from background neons",
  "color_palette": "muted naturalistic tones, warm skin tones, cool blue and magenta neon accents, desaturated compared to digital photography, film grain texture"
}

Execute generation:

python /mnt/skills/public/image-generation/scripts/generate.py \
  --prompt-file /mnt/user-data/workspace/cyberpunk-hacker.json \
  --output-file /mnt/user-data/outputs/cyberpunk-hacker-01.jpg \
  --aspect-ratio 2:3

With reference images:

{
  "characters": [{
    "gender": "based on [Image 1]",
    "age": "based on [Image 1]",
    "ethnicity": "human from [Image 1] adapted to Star Wars universe",
    "body_type": "based on [Image 1]",
    "facial_features": "matching [Image 1] with slight weathered look from space travel",
    "clothing": "Star Wars style outfit - worn leather jacket with utility vest, cargo pants with tactical pouches, scuffed boots, belt with holster",
    "accessories": "blaster pistol on hip, comlink device on wrist, goggles pushed up on forehead, satchel with supplies, personal vehicle based on [Image 2]",
    "era": "Star Wars universe, post-Empire era"
  }],
  "prompt": "Character inspired by [Image 1] standing next to a vehicle inspired by [Image 2] on a bustling alien planet street in Star Wars universe aesthetic. Character wearing worn leather jacket with utility vest, cargo pants with tactical pouches, scuffed boots, belt with blaster holster. The vehicle adapted to Star Wars aesthetic with weathered metal panels, repulsor engines, desert dust covering, parked on the street. Exotic alien marketplace street with multi-level architecture, weathered metal structures, hanging market stalls with colorful awnings, alien species walking by as background characters. Twin suns casting warm golden light, atmospheric dust particles in air, moisture vaporators visible in distance. Gritty lived-in Star Wars aesthetic, practical effects look, film grain texture, cinematic composition.",
  "negative_prompt": "clean futuristic look, sterile environment, overly CGI appearance, fantasy medieval elements, Earth architecture, modern city",
  "style": "Star Wars original trilogy aesthetic, lived-in universe, practical effects inspired, cinematic film look, slightly desaturated with warm tones",
  "composition": "medium wide shot, character in foreground with alien street extending into background, environmental storytelling, rule of thirds",
  "lighting": "warm golden hour lighting from twin suns, rim lighting on character, atmospheric haze, practical light sources from market stalls",
  "color_palette": "warm sandy tones, ochre and sienna, dusty blues, weathered metals, muted earth colors with pops of alien market colors",
  "technical": {
    "aspect_ratio": "9:16",
    "quality": "high",
    "detail_level": "highly detailed with film-like texture"
  }
}
python /mnt/skills/public/image-generation/scripts/generate.py \
  --prompt-file /mnt/user-data/workspace/star-wars-scene.json \
  --reference-images /mnt/user-data/uploads/character-ref.jpg /mnt/user-data/uploads/vehicle-ref.jpg \
  --output-file /mnt/user-data/outputs/star-wars-scene-01.jpg \
  --aspect-ratio 16:9

Common Scenarios

Use different JSON schemas for different scenarios.

Character Design:

  • Physical attributes (gender, age, ethnicity, body type)
  • Facial features and expressions
  • Clothing and accessories
  • Historical era or setting
  • Pose and context

Scene Generation:

  • Environment description
  • Time of day, weather
  • Mood and atmosphere
  • Focal points and composition

Product Visualization:

  • Product details and materials
  • Lighting setup
  • Background and context
  • Presentation angle

Specific Templates

Read the following template file only when matching the user request.

Output Handling

After generation:

  • Images are typically saved in /mnt/user-data/outputs/
  • Share generated images with user using present_files tool
  • Provide brief description of the generation result
  • Offer to iterate if adjustments needed

Tips: Enhancing Generation with Reference Images

For scenarios where visual accuracy is critical, use the image_search tool first to find reference images before generation.

Recommended scenarios for using image_search tool:

  • Character/Portrait Generation: Search for similar poses, expressions, or styles to guide facial features and body proportions
  • Specific Objects or Products: Find reference images of real objects to ensure accurate representation
  • Architectural or Environmental Scenes: Search for location references to capture authentic details
  • Fashion and Clothing: Find style references to ensure accurate garment details and styling

Example workflow:

  1. Call the image_search tool to find suitable reference images:
    image_search(query="Japanese woman street photography 1990s", size="Large")
    
  2. Download the returned image URLs to local files
  3. Use the downloaded images as --reference-images parameter in the generation script

This approach significantly improves generation quality by providing the model with concrete visual guidance rather than relying solely on text descriptions.

Notes

  • Always use English for prompts regardless of user's language
  • JSON format ensures structured, parsable prompts
  • Reference images enhance generation quality significantly
  • Iterative refinement is normal for optimal results
  • For character generation, include the detailed character object plus a consolidated prompt field

FAQ

What does image-generation do?

Use this skill when the user requests to generate, create, imagine, or visualize images including characters, scenes, products, or any visual content. Supports structured prompts and reference images for guided generation.

When should I use image-generation?

Use it when you need a repeatable workflow that produces image output, text report.

What does image-generation output?

In the evaluated run it produced image output, text report.

How do I install or invoke image-generation?

Ask the agent to use this skill when the task matches its documented workflow.

Which agents does image-generation support?

Agent support is inferred from the source, but not explicitly declared.

What tools, channels, or permissions does image-generation need?

It uses python3; channels commonly include image, text; permissions include filesystem:read, filesystem:write, network:outbound, process:spawn.

Is image-generation safe to install?

Static analysis marked this skill as medium risk; review side effects and permissions before enabling it.

How is image-generation different from an MCP or plugin?

A skill packages instructions and workflow conventions; tools, MCP servers, and plugins are dependencies the skill may call during execution.

Does image-generation outperform not using a skill?

About image-generation

When to use image-generation

When a user wants original images created from a text description. When reference images should guide style, composition, or subject details. When you need a repeatable workflow for character, scene, or product image generation.

When image-generation is not the right choice

When the task is only to edit or analyze existing images rather than generate new ones. When no Python execution environment is available to run the generation script.

What it produces

Produces image output and text report.