reflection-tuning dataset generation (#349)

2026-04-10 12:33:42 +00:00 · 2024-09-10 21:42:12 -05:00
parent 8ad50a3315
commit 835ed29dbf
7 changed files with 1077 additions and 4 deletions
--- a/ch07/05_dataset-generation/README.md
+++ b/ch07/05_dataset-generation/README.md
@@ -1,6 +1,7 @@
-# Generating a Dataset for Instruction Finetuning
+# Generating Datasets for Instruction Finetuning

 This folder contains utility code that can be used for generating a dataset for instruction finetuning.

 - [llama3-ollama.ipynb](llama3-ollama.ipynb): A notebook that creates a synthetic instruction finetuning dataset using Llama 3 and Ollama

+- [reflection-gpt4.ipynb](reflection-gpt4.ipynb): A notebook that implements an instruction dataset refinement step based on reflection-tuning
--- a/ch07/05_dataset-generation/config.json
+++ b/ch07/05_dataset-generation/config.json
@@ -0,0 +1,4 @@
+{
+  "OPENAI_API_KEY": "sk-...",
+  "_comment": "Enter your API key from https://platform.openai.com/api-keys"
+}
--- a/ch07/05_dataset-generation/llama3-ollama.ipynb
+++ b/ch07/05_dataset-generation/llama3-ollama.ipynb
@@ -498,7 +498,7 @@
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
-   "version": "3.10.6"
+   "version": "3.11.4"
  }
 },
 "nbformat": 4,
--- a/ch07/05_dataset-generation/reflection-gpt4.ipynb
+++ b/ch07/05_dataset-generation/reflection-gpt4.ipynb
--- a/ch07/05_dataset-generation/requirements-extra.txt
+++ b/ch07/05_dataset-generation/requirements-extra.txt
@@ -0,0 +1,2 @@
+openai>=1.30.3
+tqdm>=4.65.0
--- a/ch07/README.md
+++ b/ch07/README.md
@@ -12,4 +12,4 @@

 - [04_preference-tuning-with-dpo](04_preference-tuning-with-dpo) implements code for preference finetuning with Direct Preference Optimization (DPO)

- [05_dataset-generation](05_dataset-generation) contains code to generate synthetic datasets for instruction finetuning
+- [05_dataset-generation](05_dataset-generation) contains code to generate and improve synthetic datasets for instruction finetuning