AF3 ProteinDataset类的__getitem__方法解读
AlphaFold3 protein_dataset 模块 ProteinDataset 类 __getitem__
方法用于从数据集中获取一个条目,并根据配置对数据进行处理。
源代码:
def __getitem__(self, idx):
"""Return an entry from the dataset.
If a clusters file is provided, then the idx is the index of the cluster
and the chain is randomly selected from the cluster. Otherwise, the idx
is the index of the data entry and the chain is randomly selected from
the data entry.
"""
chain_id = None
cdr = None
idx = self.indices[idx]
if self.clusters is None:
id = self.data[idx] # data is already filtered by length
chain_id = random.choice(list(self.files[id].keys()))
if self.cdr is not None:
while chain_id.split("__")[1] not in self.cdr:
chain_id = random.choice(list(self.files[id].keys()))
else:
cluster = self.data[idx]
id = None
chain_n = -1
while (
id is None or len(self.files[id][chain_id]) == 0
): # some IDs can be filtered out by length
if self.shuffle_clusters:
chain_n = random.randint(0, len(self.clusters[cluster]) - 1)
else:
chain_n += 1