AF3 _make_msa_df函数解读
AlphaFold3 msa_pairing模块的_make_msa_df函数主要是将多序列比对(MSA)数据转换为 Pandas DataFrame,提取其中用于 MSA 配对的关键特征。
源代码:
def _make_msa_df(chain_features: Mapping[str, np.ndarray]) -> pd.DataFrame:
"""Makes dataframe with msa features needed for msa pairing."""
chain_msa = chain_features['msa_all_seq']
query_seq = chain_msa[0]
per_seq_similarity = np.sum(
query_seq[None] == chain_msa, axis=-1) / float(len(query_seq))
per_seq_gap = np.sum(chain_msa == 21, axis=-1) / float(len(query_seq))
msa_df = pd.DataFrame({
'msa_species_identifiers':
chain_features['msa_species_identifiers_all_seq'],
'msa_row':
np.arange(len(
chain_features['msa_species_identifiers_all_seq'])),
'msa_similarity': per_seq_similarity,
'gap': per_seq_gap
})
return msa_df
代码解读:
函数输入
def _make_msa_df(chain_features: Mapping[str, np.ndarray]) -> pd.DataFrame:
chain_features
:一个字典,包含某条链(A 或 B)的 MSA 信息。- 主要用到的 key:
'msa_all_seq'
:包含目标序列(query sequence)和 MSA 的 2D 数组&#
- 主要用到的 key: