Organized setup instructions (#115)
* Organized setup instructions * update tets * link checker action * raise error upon broken link * fix links * fix links * delete duplicated paragraph
@@ -1,118 +0,0 @@
|
||||
# Python Setup Tips
|
||||
|
||||
|
||||
|
||||
There are several different ways you can install Python and set up your computing environment. Here, I am illustrating my personal preference.
|
||||
|
||||
(I am using computers running macOS, but this workflow is similar for Linux machines and may work for other operating systems as well.)
|
||||
|
||||
|
||||
<br>
|
||||
<br>
|
||||
|
||||
|
||||
## 1. Download and install Miniforge
|
||||
|
||||
Download miniforge from the GitHub repository [here](https://github.com/conda-forge/miniforge).
|
||||
|
||||
<img src="figures/download.png" alt="download" width="600px">
|
||||
|
||||
Depending on your operating system, this should download either an `.sh` (macOS, Linux) or `.exe` file (Windows).
|
||||
|
||||
For the `.sh` file, open your command line terminal and execute the following command
|
||||
|
||||
```bash
|
||||
sh ~/Desktop/Miniforge3-MacOSX-arm64.sh
|
||||
```
|
||||
|
||||
where `Desktop/` is the folder where the Miniforge installer was downloaded to. On your computer, you may have to replace it with `Downloads/`.
|
||||
|
||||
<img src="figures/miniforge-install.png" alt="miniforge-install" width="600px">
|
||||
|
||||
Next, step through the download instructions, confirming with "Enter".
|
||||
|
||||
|
||||
|
||||
If you work with many packages, Conda can be slow because of its thorough but complex dependency resolution process and the handling of large package indexes and metadata. To speed up Conda, you can use the following setting, which switches to a more efficient Rust reimplementation for solving dependencies:
|
||||
|
||||
```
|
||||
conda config --set solver libmamba
|
||||
```
|
||||
|
||||
<br>
|
||||
<br>
|
||||
|
||||
|
||||
## 2. Create a new virtual environment
|
||||
|
||||
After the installation was successfully completed, I recommend creating a new virtual environment called `LLMs`, which you can do by executing
|
||||
|
||||
```bash
|
||||
conda create -n LLMs python=3.10
|
||||
```
|
||||
|
||||
<img src="figures/new-env.png" alt="new-env" width="600px">
|
||||
|
||||
> Many scientific computing libraries do not immediately support the newest version of Python. Therefore, when installing PyTorch, it's advisable to use a version of Python that is one or two releases older. For instance, if the latest version of Python is 3.13, using Python 3.10 or 3.11 is recommended.
|
||||
|
||||
Next, activate your new virtual environment (you have to do it every time you open a new terminal window or tab):
|
||||
|
||||
```bash
|
||||
conda activate LLMs
|
||||
```
|
||||
|
||||
<img src="figures/activate-env.png" alt="activate-env" width="600px">
|
||||
|
||||
<br>
|
||||
<br>
|
||||
|
||||
## Optional: styling your terminal
|
||||
|
||||
If you want to style your terminal similar to mine so that you can see which virtual environment is active, check out the [Oh My Zsh](https://github.com/ohmyzsh/ohmyzsh) project.
|
||||
|
||||
<br>
|
||||
<br>
|
||||
|
||||
## 3. Install new Python libraries
|
||||
|
||||
|
||||
|
||||
To install new Python libraries, you can now use the `conda` package installer. For example, you can install [JupyterLab](https://jupyter.org/install) and [watermark](https://github.com/rasbt/watermark) as follows:
|
||||
|
||||
```bash
|
||||
conda install jupyterlab watermark
|
||||
```
|
||||
|
||||
<img src="figures/conda-install.png" alt="conda-install" width="600px">
|
||||
|
||||
|
||||
|
||||
You can also still use `pip` to install libraries. By default, `pip` should be linked to your new `LLms` conda environment:
|
||||
|
||||
<img src="figures/check-pip.png" alt="check-pip" width="600px">
|
||||
|
||||
<br>
|
||||
<br>
|
||||
|
||||
## 4. Install PyTorch
|
||||
|
||||
PyTorch can be installed just like any other Python library or package using pip. For example:
|
||||
|
||||
```bash
|
||||
pip install torch==2.0.1
|
||||
```
|
||||
|
||||
However, since PyTorch is a comprehensive library featuring CPU- and GPU-compatible codes, the installation may require additional settings and explanation (see the *A.1.3 Installing PyTorch in the book for more information*).
|
||||
|
||||
It's also highly recommended to consult the installation guide menu on the official PyTorch website at [https://pytorch.org](https://pytorch.org).
|
||||
|
||||
<img src="figures/pytorch-installer.jpg" width="600px">
|
||||
|
||||
|
||||
|
||||
---
|
||||
|
||||
|
||||
|
||||
|
||||
Any questions? Please feel free to reach out in the [Discussion Forum](https://github.com/rasbt/LLMs-from-scratch/discussions).
|
||||
|
Before Width: | Height: | Size: 180 KiB |
|
Before Width: | Height: | Size: 220 KiB |
|
Before Width: | Height: | Size: 186 KiB |
|
Before Width: | Height: | Size: 174 KiB |
|
Before Width: | Height: | Size: 258 KiB |
|
Before Width: | Height: | Size: 185 KiB |
|
Before Width: | Height: | Size: 94 KiB |
@@ -1,60 +0,0 @@
|
||||
# Installing Libraries Used In This Book
|
||||
|
||||
This document provides more information on double-checking your installed Python version and packages. (Please see the [../01_optional-python-setup-preferences](../01_optional-python-setup-preferences) folder for more information on installing Python and Python packages.)
|
||||
|
||||
I used the following libraries listed [here](https://github.com/rasbt/LLMs-from-scratch/blob/main/requirements.txt) for this book. Newer versions of these libraries are likely compatible as well. However, if you experience any problems with the code, you can try these library versions as a fallback.
|
||||
|
||||
To install these requirements most conveniently, you can use the `requirements.txt` file in the root directory for this code repository and execute the following command:
|
||||
|
||||
```
|
||||
pip install -r requirements.txt
|
||||
```
|
||||
|
||||
|
||||
Then, after completing the installation, please check if all the packages are installed and are up to date using
|
||||
|
||||
```
|
||||
python python_environment_check.py
|
||||
```
|
||||
|
||||
<img src="figures/check_1.jpg" width="600px">
|
||||
|
||||
It's also recommended to check the versions in JupyterLab by running the `jupyter_environment_check.ipynb` in this directory, which should ideally give you the same results as above.
|
||||
|
||||
<img src="figures/check_2.jpg" width="500px">
|
||||
|
||||
If you see the following issues, it's likely that your JupyterLab instance is connected to wrong conda environment:
|
||||
|
||||
<img src="figures/jupyter-issues.jpg" width="450px">
|
||||
|
||||
In this case, you may want to use `watermark` to check if you opened the JupyterLab instance in the right conda environment using the `--conda` flag:
|
||||
|
||||
<img src="figures/watermark.jpg" width="350px">
|
||||
|
||||
|
||||
<br>
|
||||
<br>
|
||||
|
||||
|
||||
## Installing PyTorch
|
||||
|
||||
PyTorch can be installed just like any other Python library or package using pip. For example:
|
||||
|
||||
```bash
|
||||
pip install torch==2.0.1
|
||||
```
|
||||
|
||||
However, since PyTorch is a comprehensive library featuring CPU- and GPU-compatible codes, the installation may require additional settings and explanation (see the *A.1.3 Installing PyTorch in the book for more information*).
|
||||
|
||||
It's also highly recommended to consult the installation guide menu on the official PyTorch website at [https://pytorch.org](https://pytorch.org).
|
||||
|
||||
<img src="figures/pytorch-installer.jpg" width="600px">
|
||||
|
||||
|
||||
|
||||
---
|
||||
|
||||
|
||||
|
||||
|
||||
Any questions? Please feel free to reach out in the [Discussion Forum](https://github.com/rasbt/LLMs-from-scratch/discussions).
|
||||
|
Before Width: | Height: | Size: 107 KiB |
|
Before Width: | Height: | Size: 79 KiB |
|
Before Width: | Height: | Size: 103 KiB |
|
Before Width: | Height: | Size: 94 KiB |
|
Before Width: | Height: | Size: 36 KiB |
@@ -1,64 +0,0 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "c31e08b0-f551-4d67-b95e-41f49de3b392",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<font size=\"1\">\n",
|
||||
"Supplementary code for \"Build a Large Language Model From Scratch\": <a href=\"https://www.manning.com/books/build-a-large-language-model-from-scratch\">https://www.manning.com/books/build-a-large-language-model-from-scratch</a> by <a href=\"https://sebastianraschka.com\">Sebastian Raschka</a><br>\n",
|
||||
"Code repository: <a href=\"https://github.com/rasbt/LLMs-from-scratch\">https://github.com/rasbt/LLMs-from-scratch</a>\n",
|
||||
"</font>"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "67f6f7ed-b67d-465b-bf6f-a99b0d996930",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"[OK] Your Python version is 3.10.12\n",
|
||||
"[OK] numpy 1.26.0\n",
|
||||
"[OK] matplotlib 3.8.2\n",
|
||||
"[OK] jupyterlab 4.0.6\n",
|
||||
"[OK] tensorflow 2.15.0\n",
|
||||
"[OK] torch 2.2.1\n",
|
||||
"[OK] tqdm 4.66.1\n",
|
||||
"[OK] tiktoken 0.5.1\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from python_environment_check import check_packages, get_requirements_dict\n",
|
||||
"\n",
|
||||
"d = get_requirements_dict()\n",
|
||||
"check_packages(d)"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.6"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
@@ -1,77 +0,0 @@
|
||||
# Copyright (c) Sebastian Raschka under Apache License 2.0 (see LICENSE.txt).
|
||||
# Source for "Build a Large Language Model From Scratch"
|
||||
# - https://www.manning.com/books/build-a-large-language-model-from-scratch
|
||||
# Code: https://github.com/rasbt/LLMs-from-scratch
|
||||
|
||||
from importlib.metadata import PackageNotFoundError, import_module
|
||||
import importlib.metadata
|
||||
from os.path import dirname, join, realpath
|
||||
from packaging.version import parse as version_parse
|
||||
import platform
|
||||
import sys
|
||||
|
||||
if version_parse(platform.python_version()) < version_parse('3.9'):
|
||||
print('[FAIL] We recommend Python 3.9 or newer but'
|
||||
' found version %s' % (sys.version))
|
||||
else:
|
||||
print('[OK] Your Python version is %s' % (platform.python_version()))
|
||||
|
||||
|
||||
def get_packages(pkgs):
|
||||
versions = []
|
||||
for p in pkgs:
|
||||
try:
|
||||
imported = import_module(p)
|
||||
try:
|
||||
version = (getattr(imported, '__version__', None) or
|
||||
getattr(imported, 'version', None) or
|
||||
getattr(imported, 'version_info', None))
|
||||
if version is None:
|
||||
# If common attributes don't exist, use importlib.metadata
|
||||
version = importlib.metadata.version(p)
|
||||
versions.append(version)
|
||||
except PackageNotFoundError:
|
||||
# Handle case where package is not installed
|
||||
versions.append('0.0')
|
||||
except ImportError:
|
||||
# Fallback if importlib.import_module fails for unexpected reasons
|
||||
versions.append('0.0')
|
||||
return versions
|
||||
|
||||
|
||||
def get_requirements_dict():
|
||||
PROJECT_ROOT = dirname(realpath(__file__))
|
||||
PROJECT_ROOT_UP_TWO = dirname(dirname(PROJECT_ROOT))
|
||||
REQUIREMENTS_FILE = join(PROJECT_ROOT_UP_TWO, "requirements.txt")
|
||||
d = {}
|
||||
with open(REQUIREMENTS_FILE) as f:
|
||||
for line in f:
|
||||
if not line.strip():
|
||||
continue
|
||||
line = line.split("#")[0].strip()
|
||||
line = line.split(" ")
|
||||
line = [l.strip() for l in line]
|
||||
d[line[0]] = line[-1]
|
||||
return d
|
||||
|
||||
|
||||
def check_packages(d):
|
||||
versions = get_packages(d.keys())
|
||||
|
||||
for (pkg_name, suggested_ver), actual_ver in zip(d.items(), versions):
|
||||
if actual_ver == 'N/A':
|
||||
continue
|
||||
actual_ver, suggested_ver = version_parse(actual_ver), version_parse(suggested_ver)
|
||||
if actual_ver < suggested_ver:
|
||||
print(f'[FAIL] {pkg_name} {actual_ver}, please upgrade to >= {suggested_ver}')
|
||||
else:
|
||||
print(f'[OK] {pkg_name} {actual_ver}')
|
||||
|
||||
|
||||
def main():
|
||||
d = get_requirements_dict()
|
||||
check_packages(d)
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
main()
|
||||
@@ -1,14 +0,0 @@
|
||||
# Copyright (c) Sebastian Raschka under Apache License 2.0 (see LICENSE.txt).
|
||||
# Source for "Build a Large Language Model From Scratch"
|
||||
# - https://www.manning.com/books/build-a-large-language-model-from-scratch
|
||||
# Code: https://github.com/rasbt/LLMs-from-scratch
|
||||
|
||||
# File for internal use (unit tests)
|
||||
|
||||
from python_environment_check import main
|
||||
|
||||
|
||||
def test_main(capsys):
|
||||
main()
|
||||
captured = capsys.readouterr()
|
||||
assert "FAIL" not in captured.out
|
||||
@@ -1,11 +0,0 @@
|
||||
FROM pytorch/pytorch:2.0.1-cuda11.7-cudnn8-runtime
|
||||
|
||||
RUN apt-get update && \
|
||||
apt-get upgrade -y && \
|
||||
apt-get install -y rsync && \
|
||||
apt-get install -y git && \
|
||||
rm -rf /var/lib/apt/lists/*
|
||||
|
||||
COPY requirements.txt requirements.txt
|
||||
|
||||
RUN pip install --no-cache-dir -r requirements.txt
|
||||
@@ -1,3 +0,0 @@
|
||||
# Optional Docker Environment
|
||||
|
||||
This is an optional Docker environment for those users who prefer Docker. For more instructions, see the *Docker Environment Setup Guide* in [appendix-A/04_optional-docker-environment](../).
|
||||
@@ -1,19 +0,0 @@
|
||||
{
|
||||
"name": "LLMs From Scratch",
|
||||
"build": {
|
||||
"context": "..",
|
||||
"dockerfile": "Dockerfile"
|
||||
},
|
||||
"runArgs": ["--runtime=nvidia", "--gpus=all"],
|
||||
"customizations": {
|
||||
"vscode": {
|
||||
"extensions": [
|
||||
"ms-python.python",
|
||||
"ms-azuretools.vscode-docker",
|
||||
"ms-toolsai.jupyter",
|
||||
"yahyabatulu.vscode-markdown-alert",
|
||||
"tomoki1207.pdf"
|
||||
]
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -1,115 +0,0 @@
|
||||
# Docker Environment Setup Guide
|
||||
|
||||
If you prefer a development setup that isolates a project's dependencies and configurations, using Docker is a highly effective solution. This approach eliminates the need to manually install software packages and libraries and ensures a consistent development environment.
|
||||
|
||||
This guide will walk you through the process for setting up an optional docker environment for this book if you prefer it over using the conda approach explained in [../01_optional-python-setup-preferences](../01_optional-python-setup-preferences) and [../02_installing-python-libraries](../02_installing-python-libraries).
|
||||
|
||||
<br>
|
||||
|
||||
## Downloading and installing Docker
|
||||
|
||||
The easiest way to get started with Docker is by installing [Docker Desktop](https://docs.docker.com/desktop/) for your relevant platform.
|
||||
|
||||
Linux (Ubuntu) users may prefer to install the [Docker Engine](https://docs.docker.com/engine/install/ubuntu/) instead and follow the [post-installation](https://docs.docker.com/engine/install/linux-postinstall/) steps.
|
||||
|
||||
<br>
|
||||
|
||||
## Using a Docker DevContainer in Visual Studio Code
|
||||
|
||||
A Docker DevContainer, or Development Container, is a tool that allows developers to use Docker containers as a fully-fledged development environment. This approach ensures that users can quickly get up and running with a consistent development environment, regardless of their local machine setup.
|
||||
|
||||
While DevContainers also work with other IDEs, a commonly used IDE/editor for working with DevContainers is Visual Studio Code (VS Code). The guide below explains how to use the DevContainer for this book within a VS Code context, but a similar process should also apply to PyCharm. [Install](https://code.visualstudio.com/download) it if you don't have it and want to use it.
|
||||
|
||||
1. Clone this GitHub repository and `cd` into the project root directory.
|
||||
|
||||
```bash
|
||||
git clone https://github.com/rasbt/LLMs-from-scratch.git
|
||||
cd LLMs-from-scratch
|
||||
```
|
||||
|
||||
2. Move the `.devcontainer` file to the main `LLMs-from-scratch` project directory.
|
||||
|
||||
```bash
|
||||
mv appendix-A/04_optional-docker-environment/.devcontainer ./
|
||||
```
|
||||
|
||||
3. In Docker Desktop, make sure that ***desktop-linux* builder** is running and will be used to build the Docker container (see *Docker Desktop* -> *Change settings* -> *Builders* -> *desktop-linux* -> *...* -> *Use*)
|
||||
|
||||
4. If you have a [CUDA-supported GPU](https://developer.nvidia.com/cuda-gpus), you can speed up the training and inference:
|
||||
|
||||
3.1 Install **NVIDIA Container Toolkit** as described [here](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html#installing-with-apt). NVIDIA Container Toolkit is supported as written [here](https://docs.nvidia.com/cuda/wsl-user-guide/index.html#nvidia-compute-software-support-on-wsl-2).
|
||||
|
||||
3.2 Add *nvidia* as runtime in Docker Engine daemon config (see *Docker Desktop* -> *Change settings* -> *Docker Engine*). Add these lines to your config:
|
||||
```json
|
||||
"runtimes": {
|
||||
"nvidia": {
|
||||
"path": "nvidia-container-runtime",
|
||||
"runtimeArgs": []
|
||||
```
|
||||
|
||||
For example, the full Docker Engine daemon config json code should look like that:
|
||||
```json
|
||||
{
|
||||
"builder": {
|
||||
"gc": {
|
||||
"defaultKeepStorage": "20GB",
|
||||
"enabled": true
|
||||
}
|
||||
},
|
||||
"experimental": false,
|
||||
"runtimes": {
|
||||
"nvidia": {
|
||||
"path": "nvidia-container-runtime",
|
||||
"runtimeArgs": []
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
and restart Docker Desktop.
|
||||
|
||||
5. Type `code .` in the terminal to open the project in VS Code. Alternatively, you can launch VS Code and select the project to open from the UI.
|
||||
|
||||
6. Install the **Remote Development** extension from the VS Code *Extensions* menu on the left-hand side.
|
||||
|
||||
7. Open the DevContainer.
|
||||
|
||||
Since the `.devcontainer` folder is present in the main `LLMs-from-scratch` directory (folders starting with `.` may be invisible in your OS depending on your settings), VS Code should automatically detect it and ask whether you would like to open the project in a devcontainer. If it doesn't, simply press `Ctrl + Shift + P` to open the command palette and start typing `dev containers` to see a list of all DevContainer-specific options.
|
||||
|
||||
8. Select **Reopen in Container**.
|
||||
|
||||
Docker will now begin the process of building the Docker image specified in the `.devcontainer` configuration if it hasn't been built before, or pull the image if it's available from a registry.
|
||||
|
||||
The entire process is automated and might take a few minutes, depending on your system and internet speed. Optionally click on "Starting Dev Container (show log)" in the lower right corner of VS Code to see the current built progress.
|
||||
|
||||
Once completed, VS Code will automatically connect to the container and reopen the project within the newly created Docker development environment. You will be able to write, execute, and debug code as if it were running on your local machine, but with the added benefits of Docker's isolation and consistency.
|
||||
|
||||
> [!WARNING]
|
||||
> If you are encountering an error during the build process, this is likely because your machine does not support NVIDIA container toolkit because your machine doesn't have a compatible GPU. In this case, edit the `devcontainer.json` file to remove the `"runArgs": ["--runtime=nvidia", "--gpus=all"],` line and run the "Reopen Dev Container" procedure again.
|
||||
|
||||
9. Finished.
|
||||
|
||||
Once the image has been pulled and built, you should have your project mounted inside the container with all the packages installed, ready for development.
|
||||
|
||||
<br>
|
||||
|
||||
## Uninstalling the Docker Image
|
||||
|
||||
Below are instructions for uninstalling or removing a Docker container and image if you no longer plan to use it. This process does not remove Docker itself from your system but rather cleans up the project-specific Docker artifacts.
|
||||
|
||||
1. List all Docker images to find the one associated with your DevContainer:
|
||||
|
||||
```bash
|
||||
docker image ls
|
||||
```
|
||||
|
||||
2. Remove the Docker image using its image ID or name:
|
||||
|
||||
```bash
|
||||
docker image rm [IMAGE_ID_OR_NAME]
|
||||
```
|
||||
|
||||
<br>
|
||||
|
||||
## Uninstalling Docker
|
||||
|
||||
If you decide that Docker is not for you and wish to uninstall it, see the official documentation [here](https://docs.docker.com/desktop/uninstall/) that outlines the steps for your specific operating system.
|
||||