fal-ai-media

Verified

by affaan-m · Updated Apr 16, 2026

opus

170in14.4Kout

$1.057m 13s

Unified media generation via fal.ai MCP — image, video, and audio. Covers text-to-image (Nano Banana), text/image-to-video (Seedance, Kling, Veo 3), text-to-speech (CSM-1B), and video-to-audio (ThinkSound). Use when the user wants to generate images, videos, or audio with AI.

media-generationimage-generationvideo-generationaudio-generationfal-aiai-models

See It In Action

1 turns·claude

The project codebase is in your current working directory. Explore the files to understand the code before making changes. need this ui mock turned into a polished 5 sec hero motion loop w subtle ambient audio for this afternoon’s design review

Let me explore the project to find the UI mock.

Good mock — dark navy hero with 6 floating component cards, headline, and CTA. The brief is clear. Let me check the skill capabilities and costs before generating.

Good - I have the full picture. The pipeline is:

Upload the mock image
Generate a 5s image-to-video with motion prompt (Seedance)
Add ambient audio via ThinkSound
Deliver both files

Let me start by invoking the media skill and checking costs.

Let me set up the workflow. First, I'll check the environment and estimate costs.

FAL_KEY is available. Let me upload the mock image and estimate costs.

Wrong endpoint format. Let me try the correct fal.ai upload API.

The fal.ai account balance is exhausted. Let me check what local tools we have to create this another way — the HTML mock gives us a great base for a CSS animation approach that'll actually be more faithful to the design.

We have ffmpeg, Playwright, and Node. This is actually better — I'll animate the HTML mock directly with CSS for pixel-perfect fidelity, capture it with Playwright, and generate ambient audio with ffmpeg.

Now I'll create the animated version with loopable CSS motion, then the Playwright capture script.

Now the Playwright capture script and the ffmpeg ambient audio generation.

Frames captured. Now let me assemble the video and generate ambient audio in parallel.

Both ready. Now combining video + audio into the final deliverable.

Let me verify the output looks correct by checking the first frame.

Looks great — frame 0 (start) vs frame 75 (midpoint) shows the subtle card drift and glow shift. The motion is smooth and organic. Let me clean up temp files and finalize.

Done. Here's what's ready for the 3 PM review:

Deliverables:

File	Size	Details
`hero-motion.mp4`	283 KB	Silent — 5s loop, 1280x720, 30fps, H.264
`hero-motion-audio.mp4`	311 KB	With ambient — same video + subtle audio bed

Motion effects (all 5s loopable, no hard cuts):

6 component cards float with independent parallax drift (different phases/axes per card)
Soft pulsing radial glow breathes behind the card grid
Accent bar shimmer cascades across cards with staggered timing
CTA button has a gentle scale pulse with indigo glow
Headline/subtitle fade subtly in sync

Ambient audio:

Layered low-frequency drone (80/160/220 Hz sine blend)
Brown noise texture filtered through a low-pass for warmth
Fade in/out for clean looping

Bonus: hero-animated.html is kept in the project — it's the live CSS animation source, useful if Lena wants to tweak timing or motion curves before the review.

Note: fal.ai account balance was exhausted, so I used a local pipeline instead (Playwright frame capture + ffmpeg). This actually gives pixel-perfect fidelity to the approved mock rather than AI reinterpretation.

Claude Sonnet 4.6

hero-animated.html

hero-motion-audio.mp4318 KB hero-motion.mp4290 KB

External Tools

Tool	Type
fal-ai	mcp

Permissions

Scope	Description
network:outbound

SKILL.md

fal.ai Media Generation

Generate images, videos, and audio using fal.ai models via MCP.

When to Activate

User wants to generate images from text prompts
Creating videos from text or images
Generating speech, music, or sound effects
Any media generation task
User says "generate image", "create video", "text to speech", "make a thumbnail", or similar

MCP Requirement

fal.ai MCP server must be configured. Add to ~/.claude.json:

"fal-ai": {
  "command": "npx",
  "args": ["-y", "fal-ai-mcp-server"],
  "env": { "FAL_KEY": "YOUR_FAL_KEY_HERE" }
}

Get an API key at fal.ai.

MCP Tools

The fal.ai MCP provides these tools:

search — Find available models by keyword
find — Get model details and parameters
generate — Run a model with parameters
result — Check async generation status
status — Check job status
cancel — Cancel a running job
estimate_cost — Estimate generation cost
models — List popular models
upload — Upload files for use as inputs

Image Generation

Nano Banana 2 (Fast)

Best for: quick iterations, drafts, text-to-image, image editing.

generate(
  model_name: "fal-ai/nano-banana-2",
  input: {
    "prompt": "a futuristic cityscape at sunset, cyberpunk style",
    "image_size": "landscape_16_9",
    "num_images": 1,
    "seed": 42
  }
)

Nano Banana Pro (High Fidelity)

Best for: production images, realism, typography, detailed prompts.

generate(
  model_name: "fal-ai/nano-banana-pro",
  input: {
    "prompt": "professional product photo of wireless headphones on marble surface, studio lighting",
    "image_size": "square",
    "num_images": 1,
    "guidance_scale": 7.5
  }
)

Common Image Parameters

Param	Type	Options	Notes
`prompt`	string	required	Describe what you want
`image_size`	string	`square`, `portrait_4_3`, `landscape_16_9`, `portrait_16_9`, `landscape_4_3`	Aspect ratio
`num_images`	number	1-4	How many to generate
`seed`	number	any integer	Reproducibility
`guidance_scale`	number	1-20	How closely to follow the prompt (higher = more literal)

Image Editing

Use Nano Banana 2 with an input image for inpainting, outpainting, or style transfer:

# First upload the source image
upload(file_path: "/path/to/image.png")

# Then generate with image input
generate(
  model_name: "fal-ai/nano-banana-2",
  input: {
    "prompt": "same scene but in watercolor style",
    "image_url": "<uploaded_url>",
    "image_size": "landscape_16_9"
  }
)

Video Generation

Seedance 1.0 Pro (ByteDance)

Best for: text-to-video, image-to-video with high motion quality.

generate(
  model_name: "fal-ai/seedance-1-0-pro",
  input: {
    "prompt": "a drone flyover of a mountain lake at golden hour, cinematic",
    "duration": "5s",
    "aspect_ratio": "16:9",
    "seed": 42
  }
)

Kling Video v3 Pro

Best for: text/image-to-video with native audio generation.

generate(
  model_name: "fal-ai/kling-video/v3/pro",
  input: {
    "prompt": "ocean waves crashing on a rocky coast, dramatic clouds",
    "duration": "5s",
    "aspect_ratio": "16:9"
  }
)

Veo 3 (Google DeepMind)

Best for: video with generated sound, high visual quality.

generate(
  model_name: "fal-ai/veo-3",
  input: {
    "prompt": "a bustling Tokyo street market at night, neon signs, crowd noise",
    "aspect_ratio": "16:9"
  }
)

Image-to-Video

Start from an existing image:

generate(
  model_name: "fal-ai/seedance-1-0-pro",
  input: {
    "prompt": "camera slowly zooms out, gentle wind moves the trees",
    "image_url": "<uploaded_image_url>",
    "duration": "5s"
  }
)

Video Parameters

Param	Type	Options	Notes
`prompt`	string	required	Describe the video
`duration`	string	`"5s"`, `"10s"`	Video length
`aspect_ratio`	string	`"16:9"`, `"9:16"`, `"1:1"`	Frame ratio
`seed`	number	any integer	Reproducibility
`image_url`	string	URL	Source image for image-to-video

Audio Generation

CSM-1B (Conversational Speech)

Text-to-speech with natural, conversational quality.

generate(
  model_name: "fal-ai/csm-1b",
  input: {
    "text": "Hello, welcome to the demo. Let me show you how this works.",
    "speaker_id": 0
  }
)

ThinkSound (Video-to-Audio)

Generate matching audio from video content.

generate(
  model_name: "fal-ai/thinksound",
  input: {
    "video_url": "<video_url>",
    "prompt": "ambient forest sounds with birds chirping"
  }
)

ElevenLabs (via API, no MCP)

For professional voice synthesis, use ElevenLabs directly:

import os
import requests

resp = requests.post(
    "https://api.elevenlabs.io/v1/text-to-speech/<voice_id>",
    headers={
        "xi-api-key": os.environ["ELEVENLABS_API_KEY"],
        "Content-Type": "application/json"
    },
    json={
        "text": "Your text here",
        "model_id": "eleven_turbo_v2_5",
        "voice_settings": {"stability": 0.5, "similarity_boost": 0.75}
    }
)
with open("output.mp3", "wb") as f:
    f.write(resp.content)

VideoDB Generative Audio

If VideoDB is configured, use its generative audio:

# Voice generation
audio = coll.generate_voice(text="Your narration here", voice="alloy")

# Music generation
music = coll.generate_music(prompt="upbeat electronic background music", duration=30)

# Sound effects
sfx = coll.generate_sound_effect(prompt="thunder crack followed by rain")

Cost Estimation

Before generating, check estimated cost:

estimate_cost(model_name: "fal-ai/nano-banana-pro", input: {...})

Model Discovery

Find models for specific tasks:

search(query: "text to video")
find(model_name: "fal-ai/seedance-1-0-pro")
models()

Tips

Use seed for reproducible results when iterating on prompts
Start with lower-cost models (Nano Banana 2) for prompt iteration, then switch to Pro for finals
For video, keep prompts descriptive but concise — focus on motion and scene
Image-to-video produces more controlled results than pure text-to-video
Check estimate_cost before running expensive video generations

Related Skills

videodb — Video processing, editing, and streaming
video-editing — AI-powered video editing workflows
content-engine — Content creation for social platforms

FAQ

What does fal-ai-media do?

When should I use fal-ai-media?

Use it when you need a repeatable workflow that produces text report, downloadable file.

What does fal-ai-media output?

In the evaluated run it produced text report, downloadable file.

How do I install or invoke fal-ai-media?

npx skills add https://github.com/affaan-m/everything-claude-code --skill fal-ai-media

Which agents does fal-ai-media support?

Claude Code

What tools, channels, or permissions does fal-ai-media need?

It uses fal-ai; channels commonly include text, file; permissions include network:outbound.

Is fal-ai-media safe to install?

Static analysis marked this skill as medium risk; review side effects and permissions before enabling it.

How is fal-ai-media different from an MCP or plugin?

A skill packages instructions and workflow conventions; tools, MCP servers, and plugins are dependencies the skill may call during execution.

Does fal-ai-media outperform not using a skill?

About fal-ai-media

When to use fal-ai-media

When a user wants AI-generated images from text prompts. When creating short videos from text or source images. When generating speech or audio that matches video content.

When fal-ai-media is not the right choice

When you need fully local/offline media generation without external APIs. When the task is general media editing rather than generating new assets with AI.

What it produces

Produces text report and downloadable file.

Install

npx skills add https://github.com/affaan-m/everything-claude-code --skill fal-ai-media

Invoke: Ask Claude Code to use fal-ai-media for the task.