Language Models are Few-Shot Learners: 开箱即用的GPT-3(二)
接上一篇
Approach
前面的摘要和Introduction做了一些概要性的介绍,论文在第二章,也就是approach中,介绍了模型的设计,zero,one,few-shot的设计等等。
这一章一开头就说,GPT-3的结构和GPT-2的结构一样,只是在相应的把模型尺寸,数据规模,训练时间等增加了。Our basic pre-training approach, including model, data, and training, is similar to the process described in [RWC+19],
with relatively straightforward scaling up of the model size, dataset size and diversity, and length of training。
而且在上下文学习这一块也和GPT-2一样,Our use of in-context learning is also similar to [RWC+19], but in this work we systematically explore different settings for
learning within the context.
所以论文的意思是,从不同的角度来评估GPT-3,也就是在第一章中提到的,GPT-3有多不依赖某个具体的NLP任务&#x