F

fal-ai-media

Verified

by affaan-m · Updated Apr 16, 2026

opus
170in14.4Kout
$1.057m 13s

Unified media generation via fal.ai MCP — image, video, and audio. Covers text-to-image (Nano Banana), text/image-to-video (Seedance, Kling, Veo 3), text-to-speech (CSM-1B), and video-to-audio (ThinkSound). Use when the user wants to generate images, videos, or audio with AI.

media-generationimage-generationvideo-generationaudio-generationfal-aiai-models

See It In Action

1 turns·claude
The project codebase is in your current working directory. Explore the files to understand the code before making changes. need this ui mock turned into a polished 5 sec hero motion loop w subtle ambient audio for this afternoon’s design review

Let me explore the project to find the UI mock.

Good mock — dark navy hero with 6 floating component cards, headline, and CTA. The brief is clear. Let me check the skill capabilities and costs before generating.

Good - I have the full picture. The pipeline is:

  1. Upload the mock image
  2. Generate a 5s image-to-video with motion prompt (Seedance)
  3. Add ambient audio via ThinkSound
  4. Deliver both files

Let me start by invoking the media skill and checking costs.

Let me set up the workflow. First, I'll check the environment and estimate costs.

FAL_KEY is available. Let me upload the mock image and estimate costs.

Wrong endpoint format. Let me try the correct fal.ai upload API.

The fal.ai account balance is exhausted. Let me check what local tools we have to create this another way — the HTML mock gives us a great base for a CSS animation approach that'll actually be more faithful to the design.

We have ffmpeg, Playwright, and Node. This is actually better — I'll animate the HTML mock directly with CSS for pixel-perfect fidelity, capture it with Playwright, and generate ambient audio with ffmpeg.

Now I'll create the animated version with loopable CSS motion, then the Playwright capture script.

Now the Playwright capture script and the ffmpeg ambient audio generation.

Frames captured. Now let me assemble the video and generate ambient audio in parallel.

Both ready. Now combining video + audio into the final deliverable.

Let me verify the output looks correct by checking the first frame.

Looks great — frame 0 (start) vs frame 75 (midpoint) shows the subtle card drift and glow shift. The motion is smooth and organic. Let me clean up temp files and finalize.

Done. Here's what's ready for the 3 PM review:

Deliverables:

FileSizeDetails
hero-motion.mp4283 KBSilent — 5s loop, 1280x720, 30fps, H.264
hero-motion-audio.mp4311 KBWith ambient — same video + subtle audio bed

Motion effects (all 5s loopable, no hard cuts):

  • 6 component cards float with independent parallax drift (different phases/axes per card)
  • Soft pulsing radial glow breathes behind the card grid
  • Accent bar shimmer cascades across cards with staggered timing
  • CTA button has a gentle scale pulse with indigo glow
  • Headline/subtitle fade subtly in sync

Ambient audio:

  • Layered low-frequency drone (80/160/220 Hz sine blend)
  • Brown noise texture filtered through a low-pass for warmth
  • Fade in/out for clean looping

Bonus: hero-animated.html is kept in the project — it's the live CSS animation source, useful if Lena wants to tweak timing or motion curves before the review.

Note: fal.ai account balance was exhausted, so I used a local pipeline instead (Playwright frame capture + ffmpeg). This actually gives pixel-perfect fidelity to the approved mock rather than AI reinterpretation.

Claude Sonnet 4.6

External Tools

ToolType
fal-aimcp

Permissions

ScopeDescription
network:outbound

SKILL.md

fal.ai Media Generation

Generate images, videos, and audio using fal.ai models via MCP.

When to Activate

  • User wants to generate images from text prompts
  • Creating videos from text or images
  • Generating speech, music, or sound effects
  • Any media generation task
  • User says "generate image", "create video", "text to speech", "make a thumbnail", or similar

MCP Requirement

fal.ai MCP server must be configured. Add to ~/.claude.json:

"fal-ai": {
  "command": "npx",
  "args": ["-y", "fal-ai-mcp-server"],
  "env": { "FAL_KEY": "YOUR_FAL_KEY_HERE" }
}

Get an API key at fal.ai.

MCP Tools

The fal.ai MCP provides these tools:

  • search — Find available models by keyword
  • find — Get model details and parameters
  • generate — Run a model with parameters
  • result — Check async generation status
  • status — Check job status
  • cancel — Cancel a running job
  • estimate_cost — Estimate generation cost
  • models — List popular models
  • upload — Upload files for use as inputs

Image Generation

Nano Banana 2 (Fast)

Best for: quick iterations, drafts, text-to-image, image editing.

generate(
  model_name: "fal-ai/nano-banana-2",
  input: {
    "prompt": "a futuristic cityscape at sunset, cyberpunk style",
    "image_size": "landscape_16_9",
    "num_images": 1,
    "seed": 42
  }
)

Nano Banana Pro (High Fidelity)

Best for: production images, realism, typography, detailed prompts.

generate(
  model_name: "fal-ai/nano-banana-pro",
  input: {
    "prompt": "professional product photo of wireless headphones on marble surface, studio lighting",
    "image_size": "square",
    "num_images": 1,
    "guidance_scale": 7.5
  }
)

Common Image Parameters

ParamTypeOptionsNotes
promptstringrequiredDescribe what you want
image_sizestringsquare, portrait_4_3, landscape_16_9, portrait_16_9, landscape_4_3Aspect ratio
num_imagesnumber1-4How many to generate
seednumberany integerReproducibility
guidance_scalenumber1-20How closely to follow the prompt (higher = more literal)

Image Editing

Use Nano Banana 2 with an input image for inpainting, outpainting, or style transfer:

# First upload the source image
upload(file_path: "/path/to/image.png")

# Then generate with image input
generate(
  model_name: "fal-ai/nano-banana-2",
  input: {
    "prompt": "same scene but in watercolor style",
    "image_url": "<uploaded_url>",
    "image_size": "landscape_16_9"
  }
)

Video Generation

Seedance 1.0 Pro (ByteDance)

Best for: text-to-video, image-to-video with high motion quality.

generate(
  model_name: "fal-ai/seedance-1-0-pro",
  input: {
    "prompt": "a drone flyover of a mountain lake at golden hour, cinematic",
    "duration": "5s",
    "aspect_ratio": "16:9",
    "seed": 42
  }
)

Kling Video v3 Pro

Best for: text/image-to-video with native audio generation.

generate(
  model_name: "fal-ai/kling-video/v3/pro",
  input: {
    "prompt": "ocean waves crashing on a rocky coast, dramatic clouds",
    "duration": "5s",
    "aspect_ratio": "16:9"
  }
)

Veo 3 (Google DeepMind)

Best for: video with generated sound, high visual quality.

generate(
  model_name: "fal-ai/veo-3",
  input: {
    "prompt": "a bustling Tokyo street market at night, neon signs, crowd noise",
    "aspect_ratio": "16:9"
  }
)

Image-to-Video

Start from an existing image:

generate(
  model_name: "fal-ai/seedance-1-0-pro",
  input: {
    "prompt": "camera slowly zooms out, gentle wind moves the trees",
    "image_url": "<uploaded_image_url>",
    "duration": "5s"
  }
)

Video Parameters

ParamTypeOptionsNotes
promptstringrequiredDescribe the video
durationstring"5s", "10s"Video length
aspect_ratiostring"16:9", "9:16", "1:1"Frame ratio
seednumberany integerReproducibility
image_urlstringURLSource image for image-to-video

Audio Generation

CSM-1B (Conversational Speech)

Text-to-speech with natural, conversational quality.

generate(
  model_name: "fal-ai/csm-1b",
  input: {
    "text": "Hello, welcome to the demo. Let me show you how this works.",
    "speaker_id": 0
  }
)

ThinkSound (Video-to-Audio)

Generate matching audio from video content.

generate(
  model_name: "fal-ai/thinksound",
  input: {
    "video_url": "<video_url>",
    "prompt": "ambient forest sounds with birds chirping"
  }
)

ElevenLabs (via API, no MCP)

For professional voice synthesis, use ElevenLabs directly:

import os
import requests

resp = requests.post(
    "https://api.elevenlabs.io/v1/text-to-speech/<voice_id>",
    headers={
        "xi-api-key": os.environ["ELEVENLABS_API_KEY"],
        "Content-Type": "application/json"
    },
    json={
        "text": "Your text here",
        "model_id": "eleven_turbo_v2_5",
        "voice_settings": {"stability": 0.5, "similarity_boost": 0.75}
    }
)
with open("output.mp3", "wb") as f:
    f.write(resp.content)

VideoDB Generative Audio

If VideoDB is configured, use its generative audio:

# Voice generation
audio = coll.generate_voice(text="Your narration here", voice="alloy")

# Music generation
music = coll.generate_music(prompt="upbeat electronic background music", duration=30)

# Sound effects
sfx = coll.generate_sound_effect(prompt="thunder crack followed by rain")

Cost Estimation

Before generating, check estimated cost:

estimate_cost(model_name: "fal-ai/nano-banana-pro", input: {...})

Model Discovery

Find models for specific tasks:

search(query: "text to video")
find(model_name: "fal-ai/seedance-1-0-pro")
models()

Tips

  • Use seed for reproducible results when iterating on prompts
  • Start with lower-cost models (Nano Banana 2) for prompt iteration, then switch to Pro for finals
  • For video, keep prompts descriptive but concise — focus on motion and scene
  • Image-to-video produces more controlled results than pure text-to-video
  • Check estimate_cost before running expensive video generations

Related Skills

  • videodb — Video processing, editing, and streaming
  • video-editing — AI-powered video editing workflows
  • content-engine — Content creation for social platforms

FAQ

What does fal-ai-media do?

Unified media generation via fal.ai MCP — image, video, and audio. Covers text-to-image (Nano Banana), text/image-to-video (Seedance, Kling, Veo 3), text-to-speech (CSM-1B), and video-to-audio (ThinkSound). Use when the user wants to generate images, videos, or audio with AI.

When should I use fal-ai-media?

Use it when you need a repeatable workflow that produces text report, downloadable file.

What does fal-ai-media output?

In the evaluated run it produced text report, downloadable file.

How do I install or invoke fal-ai-media?

npx skills add https://github.com/affaan-m/everything-claude-code --skill fal-ai-media

Which agents does fal-ai-media support?

Claude Code

What tools, channels, or permissions does fal-ai-media need?

It uses fal-ai; channels commonly include text, file; permissions include network:outbound.

Is fal-ai-media safe to install?

Static analysis marked this skill as medium risk; review side effects and permissions before enabling it.

How is fal-ai-media different from an MCP or plugin?

A skill packages instructions and workflow conventions; tools, MCP servers, and plugins are dependencies the skill may call during execution.

Does fal-ai-media outperform not using a skill?

About fal-ai-media

When to use fal-ai-media

When a user wants AI-generated images from text prompts. When creating short videos from text or source images. When generating speech or audio that matches video content.

When fal-ai-media is not the right choice

When you need fully local/offline media generation without external APIs. When the task is general media editing rather than generating new assets with AI.

What it produces

Produces text report and downloadable file.

Install

npx skills add https://github.com/affaan-m/everything-claude-code --skill fal-ai-media

Invoke: Ask Claude Code to use fal-ai-media for the task.