1. 提示词
是否可以请您参考PyTorch的文档格式和文档风格,使用Markdown格式为 `next_obs` 变量编写一段相应的文档说明呢?
2. Evaluation using 2080Ti
python submit_eval_jobs.py --n-gpus 1
3. Scripts
2.1 Infer/run_cot_eval.py
2.1.1 Arguments
Required Arguments
Argument | Type | Description |
---|
--answer_extraction_fn | str | Function name for extracting answers from model outputs |
--eval_fn | str | Function name for evaluating predictions |
Model Configuration
Argument | Type | Default | Description |
---|
--model_name_or_path | str | None | Path or HuggingFace model identifier |
--tokenizer_name_or_path | str | None | Tokenizer path (defaults to model path) |
--load_in_8bit | bool | False | Load model in 8-bit quantization mode |
--load_in_half | bool | False | Load model in half precision (float16) |
--gptq | bool | False | Use GPTQ 4-bit quantization |
--use_vllm | bool | False | Use vLLM for inference acceleration |
Data Configuration
Argument | Type | Default | Description |
---|
--data_dir | str | “data/mgsm” | Directory containing test data |
--max_num_examples | int | None | Maximum number of examples to evaluate |
--infer_train_set | bool | False | Evaluate on training set instead of test set |
--prompt_format | str | “sft” | Prompt format: ‘sft’ or ‘few_shot’ |
--few_shot_prompt | str | None | Few-shot prompt class name |
Inference Configuration
Argument | Type | Default | Description |
---|
--eval_batch_size | int | 1 | Batch size for evaluation |
--temperature | float | 0.0 | Sampling temperature |
--max_tokens | int | 1024 | Maximum tokens to generate |
--gpus | str | None | Comma-separated GPU IDs |
Parallel Processing
Argument | Type | Default | Description |
---|
--n_subsets | int | 1 | Number of data subsets for parallel processing |
--subset_id | int | 0 | Current subset ID for this process |
--n_repeat_sampling | int | 1 | Number of repeated samplings |
--repeat_id_start | int | 0 | Starting repeat ID |
Output Configuration
Argument | Type | Default | Description |
---|
--save_dir | str | “results/mgsm” | Directory to save evaluation results |
--complete_partial_output | bool | False | Complete partial model outputs |