diff --git a/ch03/01_main-chapter-code/ch03.ipynb b/ch03/01_main-chapter-code/ch03.ipynb index 3635802..edae74f 100644 --- a/ch03/01_main-chapter-code/ch03.ipynb +++ b/ch03/01_main-chapter-code/ch03.ipynb @@ -204,9 +204,7 @@ "id": "ff856c58-8382-44c7-827f-798040e6e697", "metadata": {}, "source": [ - "- By convention, the unnormalized attention weights are referred to as **\"attention scores\"** whereas the normalized attention scores, which sum to 1, are referred to as **\"attention weights\"**\n", - "\n", - "- The attention weights and context vector calculation are summarized in the figure below:" + "- By convention, the unnormalized attention weights are referred to as **\"attention scores\"** whereas the normalized attention scores, which sum to 1, are referred to as **\"attention weights\"**\n" ] }, { @@ -2002,7 +2000,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.10.12" + "version": "3.10.6" } }, "nbformat": 4, diff --git a/ch05/03_bonus_pretraining_on_gutenberg/README.md b/ch05/03_bonus_pretraining_on_gutenberg/README.md index ba82c6f..36f9c96 100644 --- a/ch05/03_bonus_pretraining_on_gutenberg/README.md +++ b/ch05/03_bonus_pretraining_on_gutenberg/README.md @@ -7,7 +7,7 @@ As the Project Gutenberg website states, "the vast majority of Project Gutenberg Please read the [Project Gutenberg Permissions, Licensing and other Common Requests](https://www.gutenberg.org/policy/permission.html) page for more information about using the resources provided by Project Gutenberg.   -## How to use this code +## How to Use This Code   @@ -17,7 +17,11 @@ In this section, we download books from Project Gutenberg using code from the [` As of this writing, this will require approximately 50 GB of disk space, but it may be more depending on how much Project Gutenberg grew since then. -Follow these steps to download the dataset: +  +#### Download instructions for Linux and macOS users + + +Linux and macOS users can follow these steps to download the dataset (if you are a Windows user, please see the note below): 1. `git clone https://github.com/pgcorpus/gutenberg.git` @@ -31,8 +35,26 @@ Follow these steps to download the dataset: 5. `cd ..`   +#### Special instructions for Windows users + +The [`pgcorpus/gutenberg`](https://github.com/pgcorpus/gutenberg) code is compatible with both Linux and macOS. However, Windows users would have to make small adjustments, such as adding `shell=True` to the `subprocess` calls and replacing `rsync`. + +Alternatively, an easier way to run this code on Windows is by using the "Windows Subsystem for Linux" (WSL) feature, which allows users to run a Linux environment using Ubuntu in Windows. For more information, please read [Microsoft's official installation instruction](https://learn.microsoft.com/en-us/windows/wsl/install) and [tutorial](https://learn.microsoft.com/en-us/training/modules/wsl-introduction/). + +When using WSL, please make sure you have Python 3 installed (check via `python3 --version`, or install it for instance with `sudo apt-get install -y python3.10` for Python 3.10) and install following packages there: + +```bash +sudo apt-get update && \ +sudo apt-get upgrade -y && \ +sudo apt-get install -y python3-pip && \ +sudo apt-get install -y python-is-python3 && \ +sudo apt-get install -y rsync && \ +``` + > [!NOTE] -> The [`pgcorpus/gutenberg`](https://github.com/pgcorpus/gutenberg) code is compatible with both Linux and macOS. However, Windows users would have to make small adjustments, such as adding `shell=True` to the `subprocess` calls and replacing `rsync`. Alternatively, an easier way to run this code on Windows is by using the "Windows Subsystem for Linux" feature, which allows users to run a Linux environment in Windows. For more information, please read [Microsoft's official installation instruction](https://learn.microsoft.com/en-us/windows/wsl/install) and [tutorial](https://learn.microsoft.com/en-us/training/modules/wsl-introduction/). +> Instructions about how to set up Python and installing packages can be found in [Appendix A: Optional Python Setup Preferences](../../appendix-A/01_optional-python-setup-preferences/README.md) and [Appendix A: Installing Python Libraries](../../appendix-A/02_installing-python-libraries/README.md). +> +> Optionally, a Docker image running Ubuntu is provided with this repository. When having cloned the [`pgcorpus/gutenberg`](https://github.com/pgcorpus/gutenberg) GitHub repository, copy the *.devcontainer* folder of this `LLMs-from-scratch` repository and paste it to the locally cloned `gutenberg` repository. Instructions about how to run a container with the provided Docker image can be found in [Appendix A: Optional Docker Environment](../../appendix-A/04_optional-docker-environment/README.md).   ### 2) Prepare the dataset