AF3 ProteinDataset类的get_anchor_ind方法解读
AlphaFold3 protein_dataset 模块 ProteinDataset 类 get_anchor_ind
方法是一个 @staticmethod
静态方法,用来获取“锚定残基(anchor residues)”的索引,目的是在蛋白质序列中被遮蔽(masked)的区域两端找到“已知(known)”的残基,以便后续作为上下文参考。
源代码:
@staticmethod
def get_anchor_ind(masked_res, mask):
"""Get the indices of the anchor residues.
Anchor residues are defined as the first and last known residues before and
after each continuous masked region.
Parameters
----------
masked_res : torch.Tensor
A boolean tensor indicating which residues should be predicted
mask : torch.Tensor
A boolean tensor indicating which residues are known
Returns
-------
list
A list of indices of the anchor residues
"""
anchor_ind = []
masked_ind = torch.where(masked_res.bool())[0]
known_ind = torch.where(mask.bool())[0]
for _, g in groupby(enumerate(masked_ind), lambda x: x[0] - x[1]):
group = map(itemgetter(1), g)
group = list(map(int, group))
start, end = group[0], group[-1]