VideoMimic复现(1):环境搭建(real2sim+simulation)
下载VideoMimic代码到VideoMimic
目录(实际代码目录为VideoMimic/VideoMimic
)
git clone https://github.com/hongsukchoi/VideoMimic.git
Real2Sim
1. Main Environment (vm1rs
)
创建Docker容器,安装Anaconda环境
sudo docker run -itd --name videomimic-vm1rs -v /:/host -v /tmp/.X11-unix:/tmp/.X11-unix -e DISPLAY=unix$DISPLAY -e GDK_SCALE -e GDK_DPI_SCALE --gpus=all --network host --shm-size 32g -e NVIDIA_DRIVER_CAPABILITIES=compute,utility,video -e NVIDIA_VISIBLE_DEVICE=all nvidia/cuda:12.4.1-cudnn-devel-ubuntu22.04
sudo docker exec -it videomimic-vm1rs /bin/bash
apt update
apt install vim git mesa-utils-extra libglib2.0-0# 在Anaconda安装文件所在目录(以下只是示例,运行时以实际文件为准)
./Anaconda3-2024.10-1-Linux-x86_64.sh
打开./bashrc
,并添加以下内容,然后运行source ./bashrc
(其中/root/anaconda3
以实际anaconda3
所在目录为准)
# >>> conda initialize >>>
# !! Contents within this block are managed by 'conda init' !!
__conda_setup="$('/root/anaconda3/bin/conda' 'shell.bash' 'hook' 2> /dev/null)"
if [ $? -eq 0 ]; theneval "$__conda_setup"
elseif [ -f "/root/anaconda3/etc/profile.d/conda.sh" ]; then. "/root/anaconda3/etc/profile.d/conda.sh"elseexport PATH="/root/anaconda3/bin:$PATH"fi
fi
unset __conda_setup
# <<< conda initialize <<<
创建虚拟环境,并激活
conda create -n vm1rs python=3.12
conda activate vm1rs
安装其他依赖
cd VideoMimic/VideoMimic/real2sim
pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple/
Human Detection & Pose Estimation
cd ../../
mkdir third_party
# 1. Grounded-SAM-2 (bounding boxes and segmentation)
cd third_party/
git clone https://github.com/hongsukchoi/Grounded-SAM-2.git
cd Grounded-SAM-2
# 在~/.bashrc中添加以下内容并保存退出
export CUDA_HOME=/usr/local/cuda-12.4 # Adjust to your CUDA version
source ~/.bashrcconda activate vm1rs
pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu124 -i https://pypi.tuna.tsinghua.edu.cn/simple/
pip install -e . # Segment Anything 2
pip install --no-build-isolation -e grounding_dino # Grounding DINO
pip install transformers -i https://pypi.tuna.tsinghua.edu.cn/simple/
cd ../../# 2. ViTPose (2D pose estimation)
pip install -U openmim -i https://pypi.tuna.tsinghua.edu.cn/simple/
pip install --upgrade setuptools -i https://pypi.tuna.tsinghua.edu.cn/simple/
mim install mmcv==1.3.9 -i https://pypi.tuna.tsinghua.edu.cn/simple/ # If error, try: pip install setuptools --upgrade
cd third_party/
git clone https://github.com/ViTAE-Transformer/ViTPose.git
cd ViTPose
pip install -v -e .
cd ../../# 3. VIMO (3D human mesh - primary method)
pip install git+https://github.com/hongsukchoi/VIMO.git -i https://pypi.tuna.tsinghua.edu.cn/simple/# 4D Humans (deprecated)
# pip install git+https://github.com/hongsukchoi/4D-Humans.git# 4. BSTRO (contact detection)
cd third_party/
git clone --recursive https://github.com/hongsukchoi/bstro.git
cd bstro
python setup.py build develop
cd ../..
-
Troubleshooting: g+±11 errors
If you encounter g+±11 related errors:# Install g++-11 sudo apt update sudo apt install g++-11# Set environment variables export CC=/usr/bin/gcc-11 export CXX=/usr/bin/g++-11# Retry the installation pip install --no-build-isolation -e grounding_dino
MegaHunter + PyRoki
# Second order optimization for MegaHunter and PyRoki
pip install -U "jax[cuda12]" -i https://pypi.tuna.tsinghua.edu.cn/simple/
pip install "git+https://github.com/brentyi/jaxls.git" -i https://pypi.tuna.tsinghua.edu.cn/simple/# PyRoki for robot motion retargeting
git clone https://github.com/chungmin99/pyroki.git
cd pyroki
# pyroki might have updated some variable names;
git checkout 70b30a56b1e1ea83fb4c2cac8fe2c63a0624b9ce
pip install -e .
cd ../..
Core Dependencies
# PyTorch (avoid 2.6 - it's unstable)
pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu124 -i https://pypi.tuna.tsinghua.edu.cn/simple/# Viser for visualization
cd third_party/
git clone https://github.com/nerfstudio-project/viser
cd viser
pip install -e .
cd ../..
Optional: World Reconstruction (Align3r)
# Monst3r/Align3r (skip if only using MegaSam)
cd third_party/
git clone https://github.com/Junyi42/monst3r-depth-package.git
cd monst3r-depth-package
pip install -e . -i https://pypi.tuna.tsinghua.edu.cn/simple/
cd ../..
pip install git+https://github.com/Junyi42/croco_package.git -i https://pypi.tuna.tsinghua.edu.cn/simple/
Optional: Neural Meshification (NDC)
# NDC (skip if only using NKSR)
pip install trimesh h5py cython opencv-python -i https://pypi.tuna.tsinghua.edu.cn/simple/
cd third_party
git clone https://github.com/czq142857/NDC.git
cd NDC
python setup.py build_ext --inplace
cd ../..
Optional: Hand Pose Estimation
# WiLor (3D hand mesh)
pip install git+https://github.com/warmshao/WiLoR-mini -i https://pypi.tuna.tsinghua.edu.cn/simple/
pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu124 -i https://pypi.tuna.tsinghua.edu.cn/simple/
After configuration
- change
stage0_preprocessing/sam2_segmentation.py
164 linebox_threshold
tothreshold
2. Reconstruction Environment (vm1reocn
)
此环境负责 MegaSam 重建、NKSR 网格化以及 GeoCalib 操作
创建Docker容器,安装Anaconda环境
sudo docker run -itd --name videomimic-vm1reocn -v /:/host -v /tmp/.X11-unix:/tmp/.X11-unix -e DISPLAY=unix$DISPLAY -e GDK_SCALE -e GDK_DPI_SCALE --gpus=all --network host --shm-size 32g -e NVIDIA_DRIVER_CAPABILITIES=compute,utility,video -e NVIDIA_VISIBLE_DEVICE=all nvidia/cuda:11.8.0-cudnn8-devel-ubuntu22.04
sudo docker exec -it videomimic-vm1rs /bin/bash
apt update
apt install vim git wget# 在Anaconda安装文件所在目录(以下只是示例,运行时以实际文件为准)
./Anaconda3-2024.10-1-Linux-x86_64.sh
打开./bashrc
,并添加以下内容,然后运行source ./bashrc
(其中/root/anaconda3
以实际anaconda3
所在目录为准)
# >>> conda initialize >>>
# !! Contents within this block are managed by 'conda init' !!
__conda_setup="$('/root/anaconda3/bin/conda' 'shell.bash' 'hook' 2> /dev/null)"
if [ $? -eq 0 ]; theneval "$__conda_setup"
elseif [ -f "/root/anaconda3/etc/profile.d/conda.sh" ]; then. "/root/anaconda3/etc/profile.d/conda.sh"elseexport PATH="/root/anaconda3/bin:$PATH"fi
fi
unset __conda_setup
# <<< conda initialize <<<
cd third_party/
git clone --recursive https://github.com/Junyi42/megasam-package
cd megasam-package# Create environment from yaml
conda env create -f environment.ymlconda activate vm1recon# other dependencies
# pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple/# Additional dependencies
# 在~/.bashrc中添加以下内容并保存退出
export CUDA_HOME=/usr/local/cuda-11.8 # Adjust to your CUDA version
source ~/.bashrc
conda activate vm1recon# Install g++-11 if not already installed
# sudo apt update
# sudo apt install g++-11
export CC=/usr/bin/gcc-11
export CXX=/usr/bin/g++-11
pip install torch-scatter==2.1.2 -i https://pypi.tuna.tsinghua.edu.cn/simple/# Install specific xformers version (required for MegaSam)
cd ../
wget https://anaconda.org/xformers/xformers/0.0.22.post7/download/linux-64/xformers-0.0.22.post7-py310_cu11.8.0_pyt2.0.1.tar.bz2
conda install xformers-0.0.22.post7-py310_cu11.8.0_pyt2.0.1.tar.bz2
rm xformers-0.0.22.post7-py310_cu11.8.0_pyt2.0.1.tar.bz2# Compile DROID-SLAM components
cd megasam-package/base/python setup.py install
cd ../..# NKSR for fast meshification
conda install -c pyg -c nvidia -c conda-forge pytorch-lightning=1.9.4 tensorboard pybind11 pyg rich pandas omegaconf
# pip install -f https://pycg.huangjh.tech/packages/index.html python-pycg[full]==0.5.2 randomname pykdtree plyfile flatten-dict pyntcloud -i https://pypi.tuna.tsinghua.edu.cn/simple/pip install -f pykdtree plyfile flatten-dict pyntcloud -i https://pypi.tuna.tsinghua.edu.cn/simple/# pip install nksr -f https://nksr.huangjh.tech/whl/torch-2.0.0+cu118.html -i https://pypi.tuna.tsinghua.edu.cn/simple/
pip install nksr -f https://nksr.s3.ap-northeast-1.amazonaws.com/whl/torch-2.0.0%2Bcu118.html -i https://pypi.tuna.tsinghua.edu.cn/simple/
wget -P /root/.cache/torch/hub/checkpoints/ https://nksr.s3.ap-northeast-1.amazonaws.com/ks.pthpip install torch==2.0.0 torchvision==0.15.1 torchaudio==2.0.1 --index-url https://download.pytorch.org/whl/cu118 -i https://pypi.tuna.tsinghua.edu.cn/simple/
pip install trimesh tyro h5py rtree -i https://pypi.tuna.tsinghua.edu.cn/simple/cd ..# GeoCalib for gravity calibration
git clone https://github.com/hongsukchoi/GeoCalib.git third_party/GeoCalib
cd third_party/GeoCalib
pip install -e . -i https://pypi.tuna.tsinghua.edu.cn/simple/
cd ../..
After configuration
-
添加软连接
third_party
cd VideoMimic/VideoMimic/real2sim # 将下面的VideoMimic/third_party替换为third_party的实际路径 ln -s VideoMimic/third_party ./third_partypip install smplx chumpy open3d numpy==1.26 -i https://pypi.tuna.tsinghua.edu.cn/simple/
3. Environment Quick Reference
激活conda环境
# Most operations
conda activate vm1rs# For MegaSam reconstruction and postprocessing
conda activate vm1reocn
4. Downloading Necessary Files
cd VideoMimic/VideoMimic/real2sim
./download_gdrive_data.sh# if up don`t work, from my personal nas link download assets.zipunzip assets.zip -d ./assets
rm -rf assets.zip
assets
文件夹包含以下文件
assets/body_models/
assets/robot_asset/
assets/checkpoints/
assets/ckpt_raft/
assets/ckpt_sam2/
assets/configs/
assets/robot_asset/
Output Root
创建软链接demo_data
,下面/host/mnt/sda/Datasets/VideoMimic/demo_data
目录是本人具体目录,换成具体的存储空间足够的目录
cp -r demo_data/* /host/mnt/sda/Datasets/VideoMimic/demo_data
rm -rf demo_data
ln -s /host/mnt/sda/Datasets/VideoMimic/demo_data ./demo_data
Output Files
MegaHunter Output (output_smpl_and_points/
):
-
megahunter_{method}_reconstruction_results_{video}_cam01_frame_{start}_{end}_subsample_{factor}.h5
our_pred_world_cameras_and_structure
: Reconstructed world environmentour_pred_humans_smplx_params
: SMPL parameters for each personperson_frame_info_list
: Frame information for each person
Final Results (output_calib_mesh/
):
gravity_calibrated_megahunter.h5
: Gravity-aligned human posesgravity_calibrated_keypoints.h5
: 3D keypoints for all personsbackground_mesh.obj
: Reconstructed environment meshbackground_less_filtered_colored_pointcloud.ply
: Less filtered point cloudbackground_more_filtered_colored_pointcloud.ply
: Spatiotemporally filtered point cloudretarget_poses_{robot_name}.h5
: Robot motion data
Directory Structure
${PROJECT_ROOT}/
├── demo_data/ # Demo data for testing and visualization
│ ├── input_megasam/ # MegaSam reconstruction outputs
│ ├── input_align3r/ # Align3r reconstruction outputs
│ ├── input_images/
│ │ ├── people_jumping_nov20/
│ │ │ ├── cam01/
│ │ │ │ ├── 00001.jpg
│ │ │ │ ├── 00002.jpg
│ │ │ │ └── ...
│ ├── input_masks/ # SAM2 segmentation results
│ │ ├── people_jumping_nov20/
│ │ │ ├── cam01/
│ │ │ │ ├── mask_data/ # Binary masks
│ │ │ │ ├── json_data/ # Bounding boxes and metadata
│ │ │ │ └── meta_data.json # Multi-human tracking info
│ ├── input_2d_poses/ # ViTPose 2D pose results
│ │ ├── people_jumping_nov20/
│ │ │ ├── cam01/
│ │ │ │ ├── pose_00001.json # 2D keypoints for each person
│ │ │ │ └── ...
│ ├── input_3d_meshes/ # VIMO/HMR2 3D mesh results
│ │ ├── people_jumping_nov20/
│ │ │ ├── cam01/
│ │ │ │ ├── smpl_params_00001.pkl # SMPL parameters for each person
│ │ │ │ ├── known_betas.json # Optimized shape parameters
│ │ │ │ └── ...
│ ├── input_contacts/ # BSTRO contact detection
│ │ ├── people_jumping_nov20/
│ │ │ ├── cam01/
│ │ │ │ ├── 00001.pkl
│ │ │ │ └── ...
│ ├── output_smpl_and_points/ # MegaHunter optimization results
│ └── output_calib_mesh/ # Final processed results
├── assets/
│ ├── checkpoints/
│ │ ├── align3r_depthpro.pth
│ │ ├── depth_pro.pt
│ │ ├── vitpose_huge_wholebody.pth
│ │ ├── vitpose_huge_wholebody_256x192.py
│ │ ├── hsi_hrnet_3dpw_b32_checkpoint_15.bin # bstro checkpoint for contact prediction
│ │ └── ...
│ ├── configs/
│ │ ├── bstro_hrnet_w64.yaml
│ │ ├── config_vimo.yaml
│ │ └── vitpose/
│ ├── body_models/ # SMPL/SMPLX models
│ ├── robot_asset/ # Robot URDF and assets
│ ├── ckpt_raft/ # RAFT optical flow checkpoints
│ └── ckpt_sam2/ # SAM2 model checkpoints
├── third_party/
│ ├── GeoCalib/ # Gravity calibration library
│ ├── Grounded-SAM-2/ # SAM2 segmentation
│ ├── ViTPose/ # 2D pose estimation
│ ├── megasam-package/ # MegaSam reconstruction
│ ├── monst3r-depth-package/ # Monst3r depth prior reconstruction
│ ├── NDC/ # Neural meshification (deprecated, replaced by NKSR)
│ ├── bstro/ # Contact detection
│ ├── VIMO/ # 3D human mesh estimation
├── stage0_preprocessing/
│ ├── sam2_segmentation.py
│ ├── vitpose_2d_poses.py
│ ├── vimo_3d_mesh.py
│ ├── bstro_contact_detection.py
│ ├── wilor_hand_poses.py
│ └── smpl_to_smplx_conversion.py
├── stage1_reconstruction/
│ ├── megasam_reconstruction.py
│ └── monst3r_depth_prior_reconstruction.py
├── stage2_optimization/
│ ├── optimize_smpl_shape_for_height.py
│ ├── optimize_smpl_shape_for_robot.py
│ ├── megahunter_optimization.py
│ ├── megahunter_costs.py # JAX optimization costs
│ ├── megahunter_utils.py # Utility functions
│ ├── megahunter_utils_robust.py # Robust multi-human utilities
│ └── README_robust_handling.md # Multi-human handling docs
├── stage3_postprocessing/
│ ├── postprocessing_pipeline.py # Main postprocessing script
│ ├── mesh_generation.py
│ ├── gravity_calibration.py
│ └── meshification.py
├── stage4_retargeting/
│ └── robot_motion_retargeting.py
├── sequential_processing/
│ ├── stage0_sequential_sam2_segmentation.py
│ ├── stage0_sequential_vitpose_2d_poses.py
│ ├── stage0_sequential_vimo_3d_mesh.py
│ ├── stage0_sequential_bstro_contact_detection.py
│ ├── stage1_sequential_megasam_reconstruction.py
│ ├── stage1_sequential_monst3r_depth_prior_reconstruction.py
│ ├── stage2_sequential_megahunter_optimization.py
│ └── stage3_sequential_mesh_generation_and_geocalib.py
├── visualization/
│ ├── sequential_visualization.py
│ ├── complete_results_egoview_visualization.py
│ ├── environment_only_visualization.py
│ ├── gravity_calibration_visualization.py
│ ├── mesh_generation_visualization.py
│ ├── optimization_results_visualization.py
│ ├── retargeting_visualization.py
│ ├── viser_camera_util.py
│ └── colors.txt # Color palette for multi-human visualization
├── utilities/
│ ├── extract_frames_from_video.py
│ ├── smpl_jax_layer.py
│ ├── one_euro_filter.py
│ ├── egoview_rendering.py
│ └── viser_camera_utilities.py
├── docs/ # Documentation
│ ├── setup.md
│ ├── commands.md
│ ├── directory.md
│ └── multihuman.md
├── video_scraping/ # Video collection and filtering tools
├── unit_tests/ # Unit tests for components
├── sloper4d_eval_script/ # Benchmark evaluation scripts
├── megahunter_models/ # Precomputed SMPL models
├── process_video.sh # Main pipeline orchestration script
├── preprocess_human.sh # Human preprocessing script
├── requirements.txt # Python dependencies
└── README.md # Project overview
Simulation
Creating a Virtual Environment
创建虚拟环境,并激活
conda create -n rlgpu python=3.8
conda activate rlgpu
Installing Dependencies
Install PyTorch
pip install torch==2.4.1 torchvision==0.19.1 torchaudio==2.4.1 --index-url https://download.pytorch.org/whl/cu124 -i https://pypi.tuna.tsinghua.edu.cn/simple/
Install Isaac Gym
下载Isaac Gym包到目录VideoMimic/third_party
,按照以下命令安装。如遇到问题可翻阅文档isaacgym/docs/index.html
cd VideoMimic/third_party
unzip IsaacGym_Preview_4_Package.tar.gzcd isaacgym/python
pip install -e . -i https://pypi.tuna.tsinghua.edu.cn/simple/# verify installation
cd examplesexport LD_LIBRARY_PATH=/home/markshi/anaconda3/envs/rlgpu/lib:$LD_LIBRARY_PATH
python 1080_balls_of_solitude.py
Install videomimic_rl
cd VideoMimic/VideoMimic/simulation/videomimic_rl
pip install -e . -i https://pypi.tuna.tsinghua.edu.cn/simple/
cd ..
Install videomimic_gym
cd VideoMimic/VideoMimic/simulation/videomimic_gym
pip install -e . -i https://pypi.tuna.tsinghua.edu.cn/simple/
cd ..
Download data
下载模型checkpoints文件和处理过的视频数据
cd VideoMimic/VideoMimic/simulation/data
cp -r data/* /mnt/sda/Datasets/VideoMimic/data
rm -rf data
ln -s /mnt/sda/Datasets/VideoMimic/data ./datacd data
bash download_videomimic_data.sh
Change Scripts
-
修改
play_flat_policy.sh
和play_mcpt_policy.sh
中以下内容,否则运行报错cd VideoMimic/VideoMimic/simulation/videomimic_gym/legged_gym/scripts # change script play_flat_policy.sh and play_mcpt_policy.sh as follows: # 'lafan_single_walk' to 'lafan_replay_data' or 'lafan_walk_and_dance'
-
测试自己的视频生成的结果数据时,如报与
joint_names
相关的错误,则将此文件:VideoMimic/VideoMimic/simulation/videomimic_gym/legged_gym/tensor_utils/replay_data.py
中的内容:
'joint_names': data.attrs['/joint_names'].tolist() , 'link_names': data.attrs['/link_names'].tolist(), 'fps': data.attrs['/fps'] if '/fps' in data.attrs else self.default_data_fps,
修改为
'joint_names': data.attrs['/joint_names'].tolist() if '/joint_names' in data.attrs else data.attrs['joint_names'].tolist(), 'link_names': data.attrs['/link_names'].tolist() if '/link_names' in data.attrs else data.attrs['link_names'].tolist(), 'fps': data.attrs['/fps'] if '/fps' in data.attrs else (data.attrs['fps'] if 'fps' in data.attrs else self.default_data_fps),