AF3 create_alignment_db_sharded脚本process_chunk函数解读
AlphaFold3 create_alignment_db_sharded 脚本在源代码的scripts/alignment_db_scripts文件夹下。该脚本中的 process_chunk 函数通过调用 read_chain_dir 函数,
读取每个链的多序列比对(MSA)文件并整理成统一格式的字典结构chunk_data 返回。
函数功能:
-
read_chain_dir
:读取单个链目录下所有比对文件 → 返回一个{chain_name: [(filename, bytes)]}
-
process_chunk
:并发读取多个链目录 → 合并成一个大字典返回
源代码:
def read_chain_dir(chain_dir: Path) -> dict:"""Read all alignment files in a single chain directory and return a dictmapping chain name to file names and bytes."""if not chain_dir.is_dir():raise ValueError(f"chain_dir must be a directory, but is {chain_dir}")# ensure that PDB IDs are all lowercasepdb_id, chain = chain_dir.name.split("_")pdb_id = pdb_id.lower()chain_name = f"{pdb_id}_{chain}"file_data = []for file_path in sorted(chain_dir.iterdir()):file_name = file_path.namewith open(file_path, "rb") as file:file_bytes = file.read()file_data.append((file_name, file_bytes))return {chain_name: file_data}