当前位置：首页 > news >正文

VideoMimic复现(1)：环境搭建（real2sim+simulation）

news 2025/11/14 4:37:00

下载VideoMimic代码到VideoMimic目录（实际代码目录为VideoMimic/VideoMimic）

git clone https://github.com/hongsukchoi/VideoMimic.git

Real2Sim

1. Main Environment (`vm1rs`)

创建Docker容器，安装Anaconda环境

sudo docker run -itd --name videomimic-vm1rs -v /:/host -v /tmp/.X11-unix:/tmp/.X11-unix -e DISPLAY=unix$DISPLAY -e GDK_SCALE -e GDK_DPI_SCALE --gpus=all --network host --shm-size 32g -e NVIDIA_DRIVER_CAPABILITIES=compute,utility,video -e NVIDIA_VISIBLE_DEVICE=all nvidia/cuda:12.4.1-cudnn-devel-ubuntu22.04
sudo docker exec -it videomimic-vm1rs /bin/bash
apt update
apt install vim git mesa-utils-extra libglib2.0-0# 在Anaconda安装文件所在目录（以下只是示例，运行时以实际文件为准）
./Anaconda3-2024.10-1-Linux-x86_64.sh

打开./bashrc，并添加以下内容，然后运行source ./bashrc（其中/root/anaconda3以实际anaconda3所在目录为准）

# >>> conda initialize >>>
# !! Contents within this block are managed by 'conda init' !!
__conda_setup="$('/root/anaconda3/bin/conda' 'shell.bash' 'hook' 2> /dev/null)"
if [ $? -eq 0 ]; theneval "$__conda_setup"
elseif [ -f "/root/anaconda3/etc/profile.d/conda.sh" ]; then. "/root/anaconda3/etc/profile.d/conda.sh"elseexport PATH="/root/anaconda3/bin:$PATH"fi
fi
unset __conda_setup
# <<< conda initialize <<<

创建虚拟环境，并激活

conda create -n vm1rs python=3.12
conda activate vm1rs

安装其他依赖

cd VideoMimic/VideoMimic/real2sim
pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple/

Human Detection & Pose Estimation

cd ../../
mkdir third_party
# 1. Grounded-SAM-2 (bounding boxes and segmentation)
cd third_party/
git clone https://github.com/hongsukchoi/Grounded-SAM-2.git
cd Grounded-SAM-2
# 在~/.bashrc中添加以下内容并保存退出
export CUDA_HOME=/usr/local/cuda-12.4  # Adjust to your CUDA version
source ~/.bashrcconda activate vm1rs
pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu124 -i https://pypi.tuna.tsinghua.edu.cn/simple/
pip install -e .                        # Segment Anything 2
pip install --no-build-isolation -e grounding_dino  # Grounding DINO
pip install transformers -i https://pypi.tuna.tsinghua.edu.cn/simple/
cd ../../# 2. ViTPose (2D pose estimation)
pip install -U openmim -i https://pypi.tuna.tsinghua.edu.cn/simple/
pip install --upgrade setuptools -i https://pypi.tuna.tsinghua.edu.cn/simple/
mim install mmcv==1.3.9 -i https://pypi.tuna.tsinghua.edu.cn/simple/ # If error, try: pip install setuptools --upgrade
cd third_party/
git clone https://github.com/ViTAE-Transformer/ViTPose.git
cd ViTPose
pip install -v -e .
cd ../../# 3. VIMO (3D human mesh - primary method)
pip install git+https://github.com/hongsukchoi/VIMO.git -i https://pypi.tuna.tsinghua.edu.cn/simple/# 4D Humans (deprecated)
# pip install git+https://github.com/hongsukchoi/4D-Humans.git# 4. BSTRO (contact detection)
cd third_party/
git clone --recursive https://github.com/hongsukchoi/bstro.git
cd bstro
python setup.py build develop
cd ../..

Troubleshooting: g+±11 errors
If you encounter g+±11 related errors:

# Install g++-11
sudo apt update
sudo apt install g++-11# Set environment variables
export CC=/usr/bin/gcc-11
export CXX=/usr/bin/g++-11# Retry the installation
pip install --no-build-isolation -e grounding_dino

MegaHunter + PyRoki

# Second order optimization for MegaHunter and PyRoki
pip install -U "jax[cuda12]" -i https://pypi.tuna.tsinghua.edu.cn/simple/
pip install "git+https://github.com/brentyi/jaxls.git" -i https://pypi.tuna.tsinghua.edu.cn/simple/# PyRoki for robot motion retargeting
git clone https://github.com/chungmin99/pyroki.git
cd pyroki
# pyroki might have updated some variable names; 
git checkout 70b30a56b1e1ea83fb4c2cac8fe2c63a0624b9ce 
pip install -e .
cd ../..

Core Dependencies

# PyTorch (avoid 2.6 - it's unstable)
pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu124 -i https://pypi.tuna.tsinghua.edu.cn/simple/# Viser for visualization
cd third_party/
git clone https://github.com/nerfstudio-project/viser
cd viser
pip install -e .
cd ../..

Optional: World Reconstruction (Align3r)

# Monst3r/Align3r (skip if only using MegaSam)
cd third_party/
git clone https://github.com/Junyi42/monst3r-depth-package.git
cd monst3r-depth-package
pip install -e . -i https://pypi.tuna.tsinghua.edu.cn/simple/
cd ../..
pip install git+https://github.com/Junyi42/croco_package.git -i https://pypi.tuna.tsinghua.edu.cn/simple/

Optional: Neural Meshification (NDC)

# NDC (skip if only using NKSR)
pip install trimesh h5py cython opencv-python -i https://pypi.tuna.tsinghua.edu.cn/simple/
cd third_party
git clone https://github.com/czq142857/NDC.git
cd NDC
python setup.py build_ext --inplace
cd ../..

Optional: Hand Pose Estimation

# WiLor (3D hand mesh)
pip install git+https://github.com/warmshao/WiLoR-mini -i https://pypi.tuna.tsinghua.edu.cn/simple/
pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu124 -i https://pypi.tuna.tsinghua.edu.cn/simple/

After configuration

changestage0_preprocessing/sam2_segmentation.py164 linebox_thresholdtothreshold

2. Reconstruction Environment (`vm1reocn`)

此环境负责 MegaSam 重建、NKSR 网格化以及 GeoCalib 操作

创建Docker容器，安装Anaconda环境

sudo docker run -itd --name videomimic-vm1reocn -v /:/host -v /tmp/.X11-unix:/tmp/.X11-unix -e DISPLAY=unix$DISPLAY -e GDK_SCALE -e GDK_DPI_SCALE --gpus=all --network host --shm-size 32g -e NVIDIA_DRIVER_CAPABILITIES=compute,utility,video -e NVIDIA_VISIBLE_DEVICE=all nvidia/cuda:11.8.0-cudnn8-devel-ubuntu22.04
sudo docker exec -it videomimic-vm1rs /bin/bash
apt update
apt install vim git wget# 在Anaconda安装文件所在目录（以下只是示例，运行时以实际文件为准）
./Anaconda3-2024.10-1-Linux-x86_64.sh

打开./bashrc，并添加以下内容，然后运行source ./bashrc（其中/root/anaconda3以实际anaconda3所在目录为准）

# >>> conda initialize >>>
# !! Contents within this block are managed by 'conda init' !!
__conda_setup="$('/root/anaconda3/bin/conda' 'shell.bash' 'hook' 2> /dev/null)"
if [ $? -eq 0 ]; theneval "$__conda_setup"
elseif [ -f "/root/anaconda3/etc/profile.d/conda.sh" ]; then. "/root/anaconda3/etc/profile.d/conda.sh"elseexport PATH="/root/anaconda3/bin:$PATH"fi
fi
unset __conda_setup
# <<< conda initialize <<<

cd third_party/
git clone --recursive https://github.com/Junyi42/megasam-package
cd megasam-package# Create environment from yaml
conda env create -f environment.ymlconda activate vm1recon# other dependencies
# pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple/# Additional dependencies
# 在~/.bashrc中添加以下内容并保存退出
export CUDA_HOME=/usr/local/cuda-11.8 # Adjust to your CUDA version
source ~/.bashrc
conda activate vm1recon# Install g++-11 if not already installed
# sudo apt update
# sudo apt install g++-11
export CC=/usr/bin/gcc-11
export CXX=/usr/bin/g++-11
pip install torch-scatter==2.1.2 -i https://pypi.tuna.tsinghua.edu.cn/simple/# Install specific xformers version (required for MegaSam)
cd ../
wget https://anaconda.org/xformers/xformers/0.0.22.post7/download/linux-64/xformers-0.0.22.post7-py310_cu11.8.0_pyt2.0.1.tar.bz2
conda install xformers-0.0.22.post7-py310_cu11.8.0_pyt2.0.1.tar.bz2
rm xformers-0.0.22.post7-py310_cu11.8.0_pyt2.0.1.tar.bz2# Compile DROID-SLAM components
cd megasam-package/base/python setup.py install
cd ../..# NKSR for fast meshification
conda install -c pyg -c nvidia -c conda-forge pytorch-lightning=1.9.4 tensorboard pybind11 pyg rich pandas omegaconf
# pip install -f https://pycg.huangjh.tech/packages/index.html python-pycg[full]==0.5.2 randomname pykdtree plyfile flatten-dict pyntcloud -i https://pypi.tuna.tsinghua.edu.cn/simple/pip install -f pykdtree plyfile flatten-dict pyntcloud -i https://pypi.tuna.tsinghua.edu.cn/simple/# pip install nksr -f https://nksr.huangjh.tech/whl/torch-2.0.0+cu118.html -i https://pypi.tuna.tsinghua.edu.cn/simple/
pip install nksr -f https://nksr.s3.ap-northeast-1.amazonaws.com/whl/torch-2.0.0%2Bcu118.html -i https://pypi.tuna.tsinghua.edu.cn/simple/
wget -P /root/.cache/torch/hub/checkpoints/ https://nksr.s3.ap-northeast-1.amazonaws.com/ks.pthpip install torch==2.0.0 torchvision==0.15.1 torchaudio==2.0.1 --index-url https://download.pytorch.org/whl/cu118 -i https://pypi.tuna.tsinghua.edu.cn/simple/
pip install trimesh tyro h5py rtree -i https://pypi.tuna.tsinghua.edu.cn/simple/cd ..# GeoCalib for gravity calibration
git clone https://github.com/hongsukchoi/GeoCalib.git third_party/GeoCalib
cd third_party/GeoCalib
pip install -e . -i https://pypi.tuna.tsinghua.edu.cn/simple/
cd ../..

After configuration

添加软连接third_party

cd VideoMimic/VideoMimic/real2sim
# 将下面的VideoMimic/third_party替换为third_party的实际路径
ln -s VideoMimic/third_party ./third_partypip install smplx chumpy open3d numpy==1.26 -i https://pypi.tuna.tsinghua.edu.cn/simple/

3. Environment Quick Reference

激活conda环境

# Most operations
conda activate vm1rs# For MegaSam reconstruction and postprocessing
conda activate vm1reocn

4. Downloading Necessary Files

cd VideoMimic/VideoMimic/real2sim
./download_gdrive_data.sh# if up don`t work, from my personal nas link download assets.zipunzip assets.zip -d ./assets
rm -rf assets.zip

assets文件夹包含以下文件

assets/body_models/
assets/robot_asset/
assets/checkpoints/
assets/ckpt_raft/
assets/ckpt_sam2/
assets/configs/
assets/robot_asset/

Output Root

创建软链接demo_data，下面/host/mnt/sda/Datasets/VideoMimic/demo_data目录是本人具体目录，换成具体的存储空间足够的目录

cp -r demo_data/* /host/mnt/sda/Datasets/VideoMimic/demo_data
rm -rf demo_data
ln -s /host/mnt/sda/Datasets/VideoMimic/demo_data ./demo_data

Output Files

MegaHunter Output (output_smpl_and_points/):

megahunter_{method}_reconstruction_results_{video}_cam01_frame_{start}_{end}_subsample_{factor}.h5
- our_pred_world_cameras_and_structure: Reconstructed world environment
- our_pred_humans_smplx_params: SMPL parameters for each person
- person_frame_info_list: Frame information for each person

Final Results (output_calib_mesh/):

gravity_calibrated_megahunter.h5: Gravity-aligned human poses
gravity_calibrated_keypoints.h5: 3D keypoints for all persons
background_mesh.obj: Reconstructed environment mesh
background_less_filtered_colored_pointcloud.ply: Less filtered point cloud
background_more_filtered_colored_pointcloud.ply: Spatiotemporally filtered point cloud
retarget_poses_{robot_name}.h5: Robot motion data

Directory Structure

${PROJECT_ROOT}/
├── demo_data/               # Demo data for testing and visualization
│   ├── input_megasam/       # MegaSam reconstruction outputs
│   ├── input_align3r/       # Align3r reconstruction outputs
│   ├── input_images/
│   │   ├── people_jumping_nov20/
│   │   │   ├── cam01/
│   │   │   │   ├── 00001.jpg
│   │   │   │   ├── 00002.jpg
│   │   │   │   └── ...
│   ├── input_masks/         # SAM2 segmentation results
│   │   ├── people_jumping_nov20/
│   │   │   ├── cam01/
│   │   │   │   ├── mask_data/     # Binary masks
│   │   │   │   ├── json_data/     # Bounding boxes and metadata
│   │   │   │   └── meta_data.json # Multi-human tracking info
│   ├── input_2d_poses/      # ViTPose 2D pose results
│   │   ├── people_jumping_nov20/
│   │   │   ├── cam01/
│   │   │   │   ├── pose_00001.json  # 2D keypoints for each person
│   │   │   │   └── ...
│   ├── input_3d_meshes/     # VIMO/HMR2 3D mesh results
│   │   ├── people_jumping_nov20/
│   │   │   ├── cam01/
│   │   │   │   ├── smpl_params_00001.pkl  # SMPL parameters for each person
│   │   │   │   ├── known_betas.json        # Optimized shape parameters
│   │   │   │   └── ...
│   ├── input_contacts/      # BSTRO contact detection
│   │   ├── people_jumping_nov20/
│   │   │   ├── cam01/
│   │   │   │   ├── 00001.pkl
│   │   │   │   └── ...
│   ├── output_smpl_and_points/  # MegaHunter optimization results
│   └── output_calib_mesh/       # Final processed results
├── assets/
│   ├── checkpoints/
│   │   ├── align3r_depthpro.pth
│   │   ├── depth_pro.pt
│   │   ├── vitpose_huge_wholebody.pth
│   │   ├── vitpose_huge_wholebody_256x192.py
│   │   ├── hsi_hrnet_3dpw_b32_checkpoint_15.bin # bstro checkpoint for contact prediction
│   │   └── ...
│   ├── configs/
│   │   ├── bstro_hrnet_w64.yaml
│   │   ├── config_vimo.yaml
│   │   └── vitpose/
│   ├── body_models/      # SMPL/SMPLX models
│   ├── robot_asset/      # Robot URDF and assets
│   ├── ckpt_raft/        # RAFT optical flow checkpoints
│   └── ckpt_sam2/        # SAM2 model checkpoints
├── third_party/
│   ├── GeoCalib/         # Gravity calibration library
│   ├── Grounded-SAM-2/   # SAM2 segmentation
│   ├── ViTPose/          # 2D pose estimation
│   ├── megasam-package/  # MegaSam reconstruction
│   ├── monst3r-depth-package/ # Monst3r depth prior reconstruction
│   ├── NDC/              # Neural meshification (deprecated, replaced by NKSR)
│   ├── bstro/            # Contact detection
│   ├── VIMO/             # 3D human mesh estimation
├── stage0_preprocessing/
│   ├── sam2_segmentation.py
│   ├── vitpose_2d_poses.py
│   ├── vimo_3d_mesh.py
│   ├── bstro_contact_detection.py
│   ├── wilor_hand_poses.py
│   └── smpl_to_smplx_conversion.py
├── stage1_reconstruction/
│   ├── megasam_reconstruction.py
│   └── monst3r_depth_prior_reconstruction.py
├── stage2_optimization/
│   ├── optimize_smpl_shape_for_height.py
│   ├── optimize_smpl_shape_for_robot.py
│   ├── megahunter_optimization.py
│   ├── megahunter_costs.py        # JAX optimization costs
│   ├── megahunter_utils.py        # Utility functions
│   ├── megahunter_utils_robust.py # Robust multi-human utilities
│   └── README_robust_handling.md   # Multi-human handling docs
├── stage3_postprocessing/
│   ├── postprocessing_pipeline.py      # Main postprocessing script
│   ├── mesh_generation.py
│   ├── gravity_calibration.py
│   └── meshification.py
├── stage4_retargeting/
│   └── robot_motion_retargeting.py
├── sequential_processing/
│   ├── stage0_sequential_sam2_segmentation.py
│   ├── stage0_sequential_vitpose_2d_poses.py
│   ├── stage0_sequential_vimo_3d_mesh.py
│   ├── stage0_sequential_bstro_contact_detection.py
│   ├── stage1_sequential_megasam_reconstruction.py
│   ├── stage1_sequential_monst3r_depth_prior_reconstruction.py
│   ├── stage2_sequential_megahunter_optimization.py
│   └── stage3_sequential_mesh_generation_and_geocalib.py
├── visualization/
│   ├── sequential_visualization.py
│   ├── complete_results_egoview_visualization.py
│   ├── environment_only_visualization.py
│   ├── gravity_calibration_visualization.py
│   ├── mesh_generation_visualization.py
│   ├── optimization_results_visualization.py
│   ├── retargeting_visualization.py
│   ├── viser_camera_util.py
│   └── colors.txt           # Color palette for multi-human visualization
├── utilities/
│   ├── extract_frames_from_video.py
│   ├── smpl_jax_layer.py
│   ├── one_euro_filter.py
│   ├── egoview_rendering.py
│   └── viser_camera_utilities.py
├── docs/                 # Documentation
│   ├── setup.md
│   ├── commands.md
│   ├── directory.md
│   └── multihuman.md
├── video_scraping/       # Video collection and filtering tools
├── unit_tests/           # Unit tests for components
├── sloper4d_eval_script/ # Benchmark evaluation scripts
├── megahunter_models/    # Precomputed SMPL models
├── process_video.sh      # Main pipeline orchestration script
├── preprocess_human.sh   # Human preprocessing script
├── requirements.txt      # Python dependencies
└── README.md             # Project overview

Simulation

Creating a Virtual Environment

创建虚拟环境，并激活

conda create -n rlgpu python=3.8
conda activate rlgpu

Installing Dependencies

Install PyTorch

pip install torch==2.4.1 torchvision==0.19.1 torchaudio==2.4.1 --index-url https://download.pytorch.org/whl/cu124 -i https://pypi.tuna.tsinghua.edu.cn/simple/

Install Isaac Gym

下载Isaac Gym包到目录VideoMimic/third_party,按照以下命令安装。如遇到问题可翻阅文档isaacgym/docs/index.html

cd VideoMimic/third_party
unzip IsaacGym_Preview_4_Package.tar.gzcd isaacgym/python
pip install -e . -i https://pypi.tuna.tsinghua.edu.cn/simple/# verify installation
cd examplesexport LD_LIBRARY_PATH=/home/markshi/anaconda3/envs/rlgpu/lib:$LD_LIBRARY_PATH
python 1080_balls_of_solitude.py

Install videomimic_rl

cd VideoMimic/VideoMimic/simulation/videomimic_rl
pip install -e . -i https://pypi.tuna.tsinghua.edu.cn/simple/
cd ..

Install videomimic_gym

cd VideoMimic/VideoMimic/simulation/videomimic_gym
pip install -e . -i https://pypi.tuna.tsinghua.edu.cn/simple/
cd ..

Download data

下载模型checkpoints文件和处理过的视频数据

cd VideoMimic/VideoMimic/simulation/data
cp -r data/* /mnt/sda/Datasets/VideoMimic/data
rm -rf data
ln -s /mnt/sda/Datasets/VideoMimic/data ./datacd data
bash download_videomimic_data.sh

Change Scripts

修改play_flat_policy.sh和play_mcpt_policy.sh中以下内容，否则运行报错

cd VideoMimic/VideoMimic/simulation/videomimic_gym/legged_gym/scripts
# change script play_flat_policy.sh and play_mcpt_policy.sh as follows:
# 'lafan_single_walk' to 'lafan_replay_data' or 'lafan_walk_and_dance'

测试自己的视频生成的结果数据时，如报与joint_names相关的错误，则将此文件：VideoMimic/VideoMimic/simulation/videomimic_gym/legged_gym/tensor_utils/replay_data.py

中的内容：

'joint_names': data.attrs['/joint_names'].tolist() ,
'link_names': data.attrs['/link_names'].tolist(),
'fps': data.attrs['/fps'] if '/fps' in data.attrs else self.default_data_fps,

修改为

'joint_names': data.attrs['/joint_names'].tolist() if '/joint_names' in data.attrs else data.attrs['joint_names'].tolist(),
'link_names': data.attrs['/link_names'].tolist() if '/link_names' in data.attrs else data.attrs['link_names'].tolist(),
'fps': data.attrs['/fps'] if '/fps' in data.attrs else (data.attrs['fps'] if 'fps' in data.attrs else self.default_data_fps),