OpenClaw includes powerful media generation tools that let your agents create visual and audio content on demand. Whether you need product images, background music for videos, or promotional clips, the generation tools integrate seamlessly into your workflow.
Image Generation
Generate images from text prompts using state-of-the-art image models. Create illustrations, product photos, concept art, social media graphics, and more.
Basic image generation
Provide a detailed text prompt describing the image you want. Be specific about subject, style, lighting, and composition for best results.
Edit mode with reference images
Pass a reference image to guide the generation. The model uses your reference as a foundation while applying your prompt. Useful for style transfer, product variations, or conceptual refinements.
Aspect ratios and resolutions
Control output format with aspect ratio hints: 1:1 (square), 2:3 (portrait), 3:2 (landscape), 4:5, 16:9, and more. Resolution options include 1K, 2K, and 4K depending on provider capabilities.
Music Generation
Compose original music tracks from text descriptions. Generate background music for podcasts, videos, presentations, or games without needing a composer.
Instrumental generation
Set the instrumental flag to generate purely instrumental tracks without vocals. Specify genre, mood, tempo, and instrumentation in your prompt for tailored results.
Lyrics and vocal generation
Include lyrics in your prompt to generate songs with vocals. Describe the genre, mood, and vocal style to guide the output.
Duration control
Specify duration in seconds. Common outputs range from 15 seconds to several minutes, depending on provider limits and your prompt complexity.
Video Generation
Generate video clips from text prompts or transform existing video content. Create promotional videos, social media clips, animated content, and more.
Text-to-video
Describe the scene, action, and mood you want. The model generates a video matching your description. Include details about subjects, environment, camera movement, and timing.
Video-to-video transformation
Pass an existing video as a reference. The model transforms it based on your prompt: extend duration, change style, add effects, or modify content.
Audio generation
Enable the audio flag to include sound with your generated videos. This creates complete multimedia content ready for social media or presentations.
Practical use cases
Social media content
Generate eye-catching images and short videos for social media posts. Create variations for A/B testing or platform-specific formats.
Product visualization
Generate product images, lifestyle shots, or concept visualizations without expensive photo shoots. Use reference images for product-specific generation.
Video production
Create B-roll footage, transitions, or full clips for video projects. Combine with audio generation for complete video production.
Creative projects
Generate artwork, music, and video for creative projects, presentations, or personal use. The tools support iterative refinement through prompt adjustments.
Configuration options
Provider selection
OpenClaw supports multiple generation providers. Check your gateway configuration for available options. Each provider has different strengths, limits, and pricing.
Quality and cost tradeoffs
Higher resolution and longer durations consume more tokens and may cost more. For drafts or previews, use lower settings. Final outputs can use maximum quality.
Output handling
Generated media is saved to OpenClaw's managed storage and delivered automatically. Use the filename parameter to provide naming hints for organized output.
Tips for better results
- Be specific in prompts: Describe subject, style, mood, lighting, and composition in detail
- Use reference media: Pass reference images or videos to guide generation
- Iterate and refine: Generate variations and refine prompts based on results
- Combine tools: Generate an image, use it as a video reference, add music
- Check usage rights: Verify provider terms for commercial use if needed
Troubleshooting
Poor generation quality
- Refine your prompt with more detail
- Try different providers if available
- Use reference media to guide the model
Provider limitations
- Check file size limits (images, videos, audio)
- Verify duration limits per provider
- Some content types may not be supported
Generation failures
- Check prompt for restricted content
- Verify API keys and provider configuration
- Reduce complexity or duration if limits exceeded
Need help from people who already use this stuff?
Questions about media generation?
Get help with creative workflows, provider selection, and prompt optimization in the OpenClaw community.
FAQ
Which models support image generation?
OpenClaw supports multiple providers including OpenAI (DALL-E), Google (Image generation), and other backends. The default model is configured in your gateway settings.
Can I generate music without lyrics?
Yes. Set instrumental: true to generate purely instrumental tracks without vocals or lyrics.
What video lengths are supported?
Video generation supports various durations depending on the provider. Common ranges are 3-10 seconds. OpenClaw rounds to the nearest provider-supported duration.
Can I use reference images or videos?
Yes. The image generation tool supports reference images for edit mode. Video generation supports reference videos as input for transformation or extension.
Are generated assets free to use?
Usage rights depend on the model provider. Most providers grant full ownership of generated content for commercial and personal use. Check specific provider terms for details.