当前位置：首页 > news >正文

AF3 from_pdb_string和from_mmcif_string函数解读

news 2025/7/19 7:47:00

AlphaFold3的from_pdb_string和from_mmcif_string函数分别用来解析蛋白质PDB和mmCIF 格式结构数据并转换为 Protein 数据类。它通过 Biopython 提供的 PDBParser 和 MMCIFParser 解析 PDB/mmCIF 文件，再通过调用_from_bio_structure函数从 Biopython 解析出的 Structure 提取 原子坐标、残基类型、B 因子等信息，最终返回一个 Protein 对象。

源代码：

def _from_bio_structure(
        structure: Structure, chain_id: Optional[str] = None
) -> Protein:
    """Takes a Biopython structure and creates a `Protein` instance.

  WARNING: All non-standard residue types will be converted into UNK. All
    non-standard atoms will be ignored.

  Args:
    structure: Structure from the Biopython library.
    chain_id: If chain_id is specified (e.g. A), then only that chain is parsed.
      Otherwise all chains are parsed.

  Returns:
    A new `Protein` created from the structure contents.

  Raises:
    ValueError: If the number of models included in the structure is not 1.
    ValueError: If insertion code is detected at a residue.
  """
    models = list(structure.get_models())
    if len(models) != 1:
        raise ValueError(
            'Only single model PDBs/mmCIFs are supported. Found'
            f' {len(models)} models.'
        )
    model = models[0]

    atom_positions = []
    aatype = []
    atom_mask = []
    residue_index = []
    chain_ids = []
    b_factors = []

    for chain in model:
        if chain_id is not None and chain.id != chain_id:
            continue
        for res in chain:
            if res.id[2] != ' ':
                raise ValueError(
                    f'PDB/mmCIF contains an insertion code at chain {chain.id} and'
                    f' residue index {res.id[1]}. These are not supported.'
                )
            res_shortname = residue_constants.restype_3to1.get(res.resname, 'X')
            restype_idx = residue_constants.restype_order.get(
                res_shortname, residue_constants.restype_num)
            pos = np.zeros((residue_constants.atom_type_num, 3))
            mask = np.zeros((residue_constants.atom_type_num,))
            res_b_factors = np.zeros((residue_constants.atom_type_num,))
            for atom in res:
                if atom.name not in residue_constants.atom_types:

查看全文

http://www.dtcms.com/a/19573.html