add reinforcement leanring tutorial

This commit is contained in:
Frank Xu
2025-03-16 16:15:10 -04:00
parent 3e4cb6e5b6
commit 36fbc1d570
2 changed files with 5 additions and 5 deletions

View File

@@ -147,7 +147,7 @@
"\n",
"---\n",
"\n",
"## The Safe Agent (No AI, only one hardcoded rule)\n",
"# Trial 1: The Safe Agent (No AI, only one hardcoded rule)\n",
"We're going to implement a simple agent 'The Safe Agent' who will thrust upward if and only if the lander's `y` position is less than 0.5.\n",
"\n",
"In theory this agent shouldn't hit the ground as we have unlimited fuel, but let's see."
@@ -215,7 +215,7 @@
"\n",
"---\n",
"\n",
"## The Stable Agent (No AI, with a set of hardcoded rules)\n",
"# Trial 2: The Stable Agent (No AI, with a set of hardcoded rules)\n",
"Let's try to define and agent that can remain stable in the air.\n",
"\n",
"It will operate via the following rules:\n",
@@ -312,7 +312,7 @@
"\n",
"---\n",
"\n",
"# The AI Agent (AI agent with Deep Reinforcement Learning)\n",
"# Trial 3: The AI Agent (AI agent with Deep Reinforcement Learning)\n",
"To address this challenge, we'll use deep reinforcement learning techniques to train an agent to land the spacecraft.\n",
"\n",
"Simpler tabular methods are limited to discrete observation spaces, meaning there are a finite number of possible states. In `LunarLander-v3` however, we're dealing with a continuous range of states across 8 different parameters, meaning there are a near-infinite number of possible states. We could try to bin similar values into groups, but due to the sensitive controls of the game, even slight errors can lead to significant missteps.\n",
@@ -1075,7 +1075,7 @@
},
{
"cell_type": "code",
"execution_count": 45,
"execution_count": null,
"metadata": {},
"outputs": [
{
@@ -1187,7 +1187,7 @@
"\n",
" # Optional: Download the video\n",
" # from google.colab import files\n",
" # video_file = glob.glob('video/LunarLander-v3-rl-video-episode-*.mp4')[0] # Match the generated file\n",
" # video_file = glob.glob('video/LunarLander-v3-episode-*.mp4')[0] # Match the generated file\n",
" # files.download(video_file)"
]
},