fal-ai-media
Verifiedby affaan-m · Updated Apr 16, 2026
Unified media generation via fal.ai MCP — image, video, and audio. Covers text-to-image (Nano Banana), text/image-to-video (Seedance, Kling, Veo 3), text-to-speech (CSM-1B), and video-to-audio (ThinkSound). Use when the user wants to generate images, videos, or audio with AI.
See It In Action
Let me explore the project to find the UI mock.
Good mock — dark navy hero with 6 floating component cards, headline, and CTA. The brief is clear. Let me check the skill capabilities and costs before generating.
Good - I have the full picture. The pipeline is:
- Upload the mock image
- Generate a 5s image-to-video with motion prompt (Seedance)
- Add ambient audio via ThinkSound
- Deliver both files
Let me start by invoking the media skill and checking costs.
Let me set up the workflow. First, I'll check the environment and estimate costs.
FAL_KEY is available. Let me upload the mock image and estimate costs.
Wrong endpoint format. Let me try the correct fal.ai upload API.
The fal.ai account balance is exhausted. Let me check what local tools we have to create this another way — the HTML mock gives us a great base for a CSS animation approach that'll actually be more faithful to the design.
We have ffmpeg, Playwright, and Node. This is actually better — I'll animate the HTML mock directly with CSS for pixel-perfect fidelity, capture it with Playwright, and generate ambient audio with ffmpeg.
Now I'll create the animated version with loopable CSS motion, then the Playwright capture script.
Now the Playwright capture script and the ffmpeg ambient audio generation.
Frames captured. Now let me assemble the video and generate ambient audio in parallel.
Both ready. Now combining video + audio into the final deliverable.
Let me verify the output looks correct by checking the first frame.
Looks great — frame 0 (start) vs frame 75 (midpoint) shows the subtle card drift and glow shift. The motion is smooth and organic. Let me clean up temp files and finalize.
Done. Here's what's ready for the 3 PM review:
Deliverables:
| File | Size | Details |
|---|---|---|
hero-motion.mp4 | 283 KB | Silent — 5s loop, 1280x720, 30fps, H.264 |
hero-motion-audio.mp4 | 311 KB | With ambient — same video + subtle audio bed |
Motion effects (all 5s loopable, no hard cuts):
- 6 component cards float with independent parallax drift (different phases/axes per card)
- Soft pulsing radial glow breathes behind the card grid
- Accent bar shimmer cascades across cards with staggered timing
- CTA button has a gentle scale pulse with indigo glow
- Headline/subtitle fade subtly in sync
Ambient audio:
- Layered low-frequency drone (80/160/220 Hz sine blend)
- Brown noise texture filtered through a low-pass for warmth
- Fade in/out for clean looping
Bonus: hero-animated.html is kept in the project — it's the live CSS animation source, useful if Lena wants to tweak timing or motion curves before the review.
Note: fal.ai account balance was exhausted, so I used a local pipeline instead (Playwright frame capture + ffmpeg). This actually gives pixel-perfect fidelity to the approved mock rather than AI reinterpretation.
External Tools
| Tool | Type |
|---|---|
| fal-ai | mcp |
Permissions
| Scope | Description |
|---|---|
| network:outbound |
SKILL.md
fal.ai Media Generation
Generate images, videos, and audio using fal.ai models via MCP.
When to Activate
- User wants to generate images from text prompts
- Creating videos from text or images
- Generating speech, music, or sound effects
- Any media generation task
- User says "generate image", "create video", "text to speech", "make a thumbnail", or similar
MCP Requirement
fal.ai MCP server must be configured. Add to ~/.claude.json:
"fal-ai": {
"command": "npx",
"args": ["-y", "fal-ai-mcp-server"],
"env": { "FAL_KEY": "YOUR_FAL_KEY_HERE" }
}
Get an API key at fal.ai.
MCP Tools
The fal.ai MCP provides these tools:
search— Find available models by keywordfind— Get model details and parametersgenerate— Run a model with parametersresult— Check async generation statusstatus— Check job statuscancel— Cancel a running jobestimate_cost— Estimate generation costmodels— List popular modelsupload— Upload files for use as inputs
Image Generation
Nano Banana 2 (Fast)
Best for: quick iterations, drafts, text-to-image, image editing.
generate(
model_name: "fal-ai/nano-banana-2",
input: {
"prompt": "a futuristic cityscape at sunset, cyberpunk style",
"image_size": "landscape_16_9",
"num_images": 1,
"seed": 42
}
)
Nano Banana Pro (High Fidelity)
Best for: production images, realism, typography, detailed prompts.
generate(
model_name: "fal-ai/nano-banana-pro",
input: {
"prompt": "professional product photo of wireless headphones on marble surface, studio lighting",
"image_size": "square",
"num_images": 1,
"guidance_scale": 7.5
}
)
Common Image Parameters
| Param | Type | Options | Notes |
|---|---|---|---|
prompt | string | required | Describe what you want |
image_size | string | square, portrait_4_3, landscape_16_9, portrait_16_9, landscape_4_3 | Aspect ratio |
num_images | number | 1-4 | How many to generate |
seed | number | any integer | Reproducibility |
guidance_scale | number | 1-20 | How closely to follow the prompt (higher = more literal) |
Image Editing
Use Nano Banana 2 with an input image for inpainting, outpainting, or style transfer:
# First upload the source image
upload(file_path: "/path/to/image.png")
# Then generate with image input
generate(
model_name: "fal-ai/nano-banana-2",
input: {
"prompt": "same scene but in watercolor style",
"image_url": "<uploaded_url>",
"image_size": "landscape_16_9"
}
)
Video Generation
Seedance 1.0 Pro (ByteDance)
Best for: text-to-video, image-to-video with high motion quality.
generate(
model_name: "fal-ai/seedance-1-0-pro",
input: {
"prompt": "a drone flyover of a mountain lake at golden hour, cinematic",
"duration": "5s",
"aspect_ratio": "16:9",
"seed": 42
}
)
Kling Video v3 Pro
Best for: text/image-to-video with native audio generation.
generate(
model_name: "fal-ai/kling-video/v3/pro",
input: {
"prompt": "ocean waves crashing on a rocky coast, dramatic clouds",
"duration": "5s",
"aspect_ratio": "16:9"
}
)
Veo 3 (Google DeepMind)
Best for: video with generated sound, high visual quality.
generate(
model_name: "fal-ai/veo-3",
input: {
"prompt": "a bustling Tokyo street market at night, neon signs, crowd noise",
"aspect_ratio": "16:9"
}
)
Image-to-Video
Start from an existing image:
generate(
model_name: "fal-ai/seedance-1-0-pro",
input: {
"prompt": "camera slowly zooms out, gentle wind moves the trees",
"image_url": "<uploaded_image_url>",
"duration": "5s"
}
)
Video Parameters
| Param | Type | Options | Notes |
|---|---|---|---|
prompt | string | required | Describe the video |
duration | string | "5s", "10s" | Video length |
aspect_ratio | string | "16:9", "9:16", "1:1" | Frame ratio |
seed | number | any integer | Reproducibility |
image_url | string | URL | Source image for image-to-video |
Audio Generation
CSM-1B (Conversational Speech)
Text-to-speech with natural, conversational quality.
generate(
model_name: "fal-ai/csm-1b",
input: {
"text": "Hello, welcome to the demo. Let me show you how this works.",
"speaker_id": 0
}
)
ThinkSound (Video-to-Audio)
Generate matching audio from video content.
generate(
model_name: "fal-ai/thinksound",
input: {
"video_url": "<video_url>",
"prompt": "ambient forest sounds with birds chirping"
}
)
ElevenLabs (via API, no MCP)
For professional voice synthesis, use ElevenLabs directly:
import os
import requests
resp = requests.post(
"https://api.elevenlabs.io/v1/text-to-speech/<voice_id>",
headers={
"xi-api-key": os.environ["ELEVENLABS_API_KEY"],
"Content-Type": "application/json"
},
json={
"text": "Your text here",
"model_id": "eleven_turbo_v2_5",
"voice_settings": {"stability": 0.5, "similarity_boost": 0.75}
}
)
with open("output.mp3", "wb") as f:
f.write(resp.content)
VideoDB Generative Audio
If VideoDB is configured, use its generative audio:
# Voice generation
audio = coll.generate_voice(text="Your narration here", voice="alloy")
# Music generation
music = coll.generate_music(prompt="upbeat electronic background music", duration=30)
# Sound effects
sfx = coll.generate_sound_effect(prompt="thunder crack followed by rain")
Cost Estimation
Before generating, check estimated cost:
estimate_cost(model_name: "fal-ai/nano-banana-pro", input: {...})
Model Discovery
Find models for specific tasks:
search(query: "text to video")
find(model_name: "fal-ai/seedance-1-0-pro")
models()
Tips
- Use
seedfor reproducible results when iterating on prompts - Start with lower-cost models (Nano Banana 2) for prompt iteration, then switch to Pro for finals
- For video, keep prompts descriptive but concise — focus on motion and scene
- Image-to-video produces more controlled results than pure text-to-video
- Check
estimate_costbefore running expensive video generations
Related Skills
videodb— Video processing, editing, and streamingvideo-editing— AI-powered video editing workflowscontent-engine— Content creation for social platforms
FAQ
What does fal-ai-media do?
Unified media generation via fal.ai MCP — image, video, and audio. Covers text-to-image (Nano Banana), text/image-to-video (Seedance, Kling, Veo 3), text-to-speech (CSM-1B), and video-to-audio (ThinkSound). Use when the user wants to generate images, videos, or audio with AI.
When should I use fal-ai-media?
Use it when you need a repeatable workflow that produces text report, downloadable file.
What does fal-ai-media output?
In the evaluated run it produced text report, downloadable file.
How do I install or invoke fal-ai-media?
npx skills add https://github.com/affaan-m/everything-claude-code --skill fal-ai-media
Which agents does fal-ai-media support?
Claude Code
What tools, channels, or permissions does fal-ai-media need?
It uses fal-ai; channels commonly include text, file; permissions include network:outbound.
Is fal-ai-media safe to install?
Static analysis marked this skill as medium risk; review side effects and permissions before enabling it.
How is fal-ai-media different from an MCP or plugin?
A skill packages instructions and workflow conventions; tools, MCP servers, and plugins are dependencies the skill may call during execution.
Does fal-ai-media outperform not using a skill?
About fal-ai-media
When to use fal-ai-media
When a user wants AI-generated images from text prompts. When creating short videos from text or source images. When generating speech or audio that matches video content.
When fal-ai-media is not the right choice
When you need fully local/offline media generation without external APIs. When the task is general media editing rather than generating new assets with AI.
What it produces
Produces text report and downloadable file.
Install
npx skills add https://github.com/affaan-m/everything-claude-code --skill fal-ai-mediaInvoke: Ask Claude Code to use fal-ai-media for the task.