HiVideo turns two simple inputs—a motion video and a character image—into a finished AI-generated video in about 15 minutes. No video editing skills required.
This guide walks through exactly how the process works, what happens behind the scenes, and how to get the best possible results from your generations.
The Two Inputs You Need
Every HiVideo generation starts with two things: a motion reference and a character.
Motion Reference Video
This is a video showing the movement you want your character to perform. It could be you recording yourself, a stock video clip, or any footage with clear human movement.
What makes a good reference:
- Clear, well-lit footage
- Subject visible from shoulders up (minimum)
- Stable camera (tripod or steady hands)
- Simple background (not required, but helps)
- 10 seconds or less (longer videos can be trimmed)
Supported formats: MP4, MOV (max 100MB)
Character Image
This is who will perform the motion. You have three options:
- AI Characters (Built-in): HiVideo includes ready-to-use AI-generated characters. Professional-looking, diverse options, no upload needed.
- Upload Your Own: Use any image—a photo, illustration, AI-generated art, or digital avatar. PNG, JPG, or WebP formats.
- Face Swap: Start with your reference video's person, but swap their face with a different character. Useful when you like a reference video but want a different presenter.
The Generation Process (Step by Step)
Here's exactly what you do in HiVideo:

Step 1: Upload Your Motion Video
Drag and drop or click to upload. If your video is longer than 10 seconds, you'll be prompted to trim it to select the best 10-second segment.
Step 2: Choose Your Character
Pick from AI characters, upload your own image, or use face swap. The character selection shows you exactly what your output will look like.
Step 3: Add a Scene Description (Optional)
A text prompt that describes the scene—not the motion (that comes from your video). Examples:
- "A professional studio with soft lighting"
- "Outdoor setting with natural sunlight"
- "Minimalist white background"
Step 4: Generate
Click the button. Your video enters the processing queue.
Step 5: Wait (~15 minutes)
AI generation takes time. You'll receive an email when your video is ready, so you don't need to keep the page open.
Step 6: Configure Audio
Once your video is generated, choose whether to keep the original audio from your reference video, add a new voice, or keep it silent.

Step 7: Download
Your finished video is ready to download. Use it directly or bring it into your video editor for further work.
What Happens Behind the Scenes
When you click generate, here's what our AI systems do:
Pose Estimation
First, we analyze your reference video frame by frame. AI identifies body keypoints—head, shoulders, elbows, wrists, hips—tracking their position throughout the video. This creates a motion "skeleton."
Facial Landmark Tracking
Separately, we track 68+ facial landmarks: eyes, eyebrows, nose, mouth, jaw line. This captures the subtle expressions that make video feel human.
Character Analysis
The AI studies your character image, understanding its proportions, style, lighting, and structure. This ensures the output matches your character's appearance.
Motion Mapping
The extracted motion data is translated to your character's proportions. A tall character and a short character move differently even when performing the same action—the AI accounts for this.
Frame-by-Frame Generation
Each video frame is generated individually. The AI creates your character in each pose while maintaining visual consistency.
Temporal Coherence
Raw frame-by-frame generation can look jittery. A smoothing pass ensures natural movement transitions and consistent appearance across frames.
Audio Processing
If you kept original audio, it's synchronized with the generated video. The frame timing matches the original, so audio stays in sync.
Final Composition
Everything comes together into your downloadable MP4 file.
Tips for Best Results
Get more out of every generation with these practical tips:
Reference Video Tips
- Film in good lighting (natural light or well-lit room)
- Keep the camera steady (tripod recommended)
- Face the camera directly for best face tracking
- Wear solid colors (patterns can confuse the AI)
- Move at a moderate pace (very fast motion may not track well)
- Keep movements within frame (don't go off-screen)
Character Image Tips
- Use high-resolution images (1024px+ on the short side)
- Choose images with clear, visible faces
- Front-facing or slight angles work best
- Consistent lighting in the image helps
- AI-generated characters often work better than photos
Prompt Tips
- Describe the environment, not the action
- Keep it simple: "studio lighting" beats a paragraph
- Don't contradict your reference (indoor motion + "outdoor beach" = confusion)
When to Use Face Swap
- You have a great reference video but want a different face
- You want a specific person's likeness on motion from stock footage
- You're building a consistent character across many videos
Troubleshooting Common Issues
Something not right? Here's how to fix common problems:
Character looks distorted
- Check your character image resolution (too small = poor results)
- Try a different character image with clearer features
- Ensure the character image has good lighting
Motion doesn't match reference
- Your reference video may be too dark or blurry
- Try a reference with the subject more centered
- Avoid references with multiple people (AI may get confused)
Output looks jittery
- Reference video may have inconsistent lighting
- Try a reference with smoother, slower movements
- Check if reference has frame drops or compression artifacts
Generation failed
- File may be too large (100MB limit)
- Format may not be supported (use MP4 or MOV)
- Try re-uploading the file
Audio out of sync
- This is rare but can happen with variable frame rate videos
- Try converting your reference to constant frame rate before upload
- Or generate without audio and add it in post
Conclusion
HiVideo's process is designed to be simple: upload two files, click a button, get a video. The AI handles the complex work of motion extraction, character mapping, and video generation.
The best way to learn is to try it. Your first generation will teach you more than any guide.