Media Generation: Images, Music & Video - OpenClaw Guide

OpenClaw includes powerful media generation tools that let your agents create visual and audio content on demand. Whether you need product images, background music for videos, or promotional clips, the generation tools integrate seamlessly into your workflow.

Image Generation

Generate images from text prompts using state-of-the-art image models. Create illustrations, product photos, concept art, social media graphics, and more.

Basic image generation

Provide a detailed text prompt describing the image you want. Be specific about subject, style, lighting, and composition for best results.

Edit mode with reference images

Pass a reference image to guide the generation. The model uses your reference as a foundation while applying your prompt. Useful for style transfer, product variations, or conceptual refinements.

Aspect ratios and resolutions

Control output format with aspect ratio hints: 1:1 (square), 2:3 (portrait), 3:2 (landscape), 4:5, 16:9, and more. Resolution options include 1K, 2K, and 4K depending on provider capabilities.

Music Generation

Compose original music tracks from text descriptions. Generate background music for podcasts, videos, presentations, or games without needing a composer.

Instrumental generation

Set the instrumental flag to generate purely instrumental tracks without vocals. Specify genre, mood, tempo, and instrumentation in your prompt for tailored results.

Lyrics and vocal generation

Include lyrics in your prompt to generate songs with vocals. Describe the genre, mood, and vocal style to guide the output.

Duration control

Specify duration in seconds. Common outputs range from 15 seconds to several minutes, depending on provider limits and your prompt complexity.

Video Generation

Generate video clips from text prompts or transform existing video content. Create promotional videos, social media clips, animated content, and more.

Text-to-video

Describe the scene, action, and mood you want. The model generates a video matching your description. Include details about subjects, environment, camera movement, and timing.

Video-to-video transformation

Pass an existing video as a reference. The model transforms it based on your prompt: extend duration, change style, add effects, or modify content.

Audio generation

Enable the audio flag to include sound with your generated videos. This creates complete multimedia content ready for social media or presentations.

Practical use cases

Social media content

Generate eye-catching images and short videos for social media posts. Create variations for A/B testing or platform-specific formats.

Product visualization

Generate product images, lifestyle shots, or concept visualizations without expensive photo shoots. Use reference images for product-specific generation.

Video production

Create B-roll footage, transitions, or full clips for video projects. Combine with audio generation for complete video production.

Creative projects

Generate artwork, music, and video for creative projects, presentations, or personal use. The tools support iterative refinement through prompt adjustments.

Configuration options

Provider selection

OpenClaw supports multiple generation providers. Check your gateway configuration for available options. Each provider has different strengths, limits, and pricing.

Quality and cost tradeoffs

Higher resolution and longer durations consume more tokens and may cost more. For drafts or previews, use lower settings. Final outputs can use maximum quality.

Output handling

Generated media is saved to OpenClaw's managed storage and delivered automatically. Use the filename parameter to provide naming hints for organized output.

Tips for better results

Be specific in prompts: Describe subject, style, mood, lighting, and composition in detail
Use reference media: Pass reference images or videos to guide generation
Iterate and refine: Generate variations and refine prompts based on results
Combine tools: Generate an image, use it as a video reference, add music
Check usage rights: Verify provider terms for commercial use if needed

Troubleshooting

Poor generation quality

Refine your prompt with more detail
Try different providers if available
Use reference media to guide the model

Provider limitations

Check file size limits (images, videos, audio)
Verify duration limits per provider
Some content types may not be supported

Generation failures

Check prompt for restricted content
Verify API keys and provider configuration
Reduce complexity or duration if limits exceeded

Need help from people who already use this stuff?

Questions about media generation?

Get help with creative workflows, provider selection, and prompt optimization in the OpenClaw community.

Join My AI Agent Profit Lab See the community page

FAQ

Which models support image generation?

OpenClaw supports multiple providers including OpenAI (DALL-E), Google (Image generation), and other backends. The default model is configured in your gateway settings.

Can I generate music without lyrics?

Yes. Set instrumental: true to generate purely instrumental tracks without vocals or lyrics.

What video lengths are supported?

Video generation supports various durations depending on the provider. Common ranges are 3-10 seconds. OpenClaw rounds to the nearest provider-supported duration.

Can I use reference images or videos?

Yes. The image generation tool supports reference images for edit mode. Video generation supports reference videos as input for transformation or extension.

Are generated assets free to use?

Usage rights depend on the model provider. Most providers grant full ownership of generated content for commercial and personal use. Check specific provider terms for details.

Media Generation