revise RAG readme

This commit is contained in:
Frank Xu
2025-04-07 11:40:15 -04:00
parent 59442dd546
commit fa99f9d99a

View File

@@ -1,4 +1,17 @@
# RAG-based Cyber Forensics Investigation Tool [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) # RAG-based Cyber Forensics Investigation Tool
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
## Author
**Mohit Ajaykumar Dhabuwala**
- M.S. in Cyber Forensics and Counterterrorism
- Specialization: Digital Forensics & Incident Response (DFIR)
- Proficient in:
- Memory, Windows, mobile, and network forensics
- Forensic tools: Magnet AXIOM, EnCase, Volatility, Wireshark
- Programming languages: Python, Bash, PowerShell for forensic data parsing and automation
## What is RAG? ## What is RAG?
@@ -72,7 +85,7 @@ The code comprises:
RAG offers these advantages: RAG offers these advantages:
- **Contextualized responses:** Answers are grounded in the provided cyber forensics document. - **Contextualized responses:** Answers are based on the provided cyber forensics document.
- **Interactive interface:** User-friendly chat interaction. - **Interactive interface:** User-friendly chat interaction.
- **Efficiency:** FAISS enables fast retrieval. - **Efficiency:** FAISS enables fast retrieval.
- **Cloud-based execution:** Google Colab provides a convenient environment. - **Cloud-based execution:** Google Colab provides a convenient environment.
@@ -82,7 +95,7 @@ RAG offers these advantages:
_(Flowchart image included here)_ _(Flowchart image included here)_
![Flowchart](Colab_RAG.png) ![Flowchart](image_f6fb04.png-8c6bf71b-bc0b-4179-93a2-cc646df542c9)
## Setup and Usage ## Setup and Usage
@@ -97,38 +110,19 @@ _(Flowchart image included here)_
4. **Install Python dependencies:** Execute these commands in a Colab cell: 4. **Install Python dependencies:** Execute these commands in a Colab cell:
```bash ```bash
!pip install -U langchain langchain-core langchain-huggingface langchain_community faiss-cpu huggingface_hub !pip install transformers langchain langchain_community faiss-cpu huggingface_hub pypdf pymupdf -U langchain langchain-huggingface
!pip install --upgrade langchain !pip install --upgrade langchain
``` ```
5. **Provide Hugging Face API Token:** Add a code cell to set the `HUGGINGFACEHUB_API_TOKEN` environment variable: 5. **Provide Hugging Face API Token:** Add a code cell to set the `HUGGINGFACEHUB_API_TOKEN` environment variable with your token:
```python ```python
api_token = "ENTER THE API KEY" # Replace 'ENTER THE API KEY' with your actual token import os
os.environ['HUGGINGFACEHUB_API_TOKEN'] = 'hf_your_token' # Replace 'hf_your_token' with your actual token
``` ```
6. **Provide Your Knowledge Base:** Add a cell to define `scenario_text` (Any passage of your choice). 6. **Provide your knowledge base:** Add a cell to define `document_text` (the scenario).
7. **Run the Code:** Execute the cells in order to interact with the RAG system. 7. **Run the code:** Execute the cells to interact with the RAG system.
## Background Story Used
This project utilizes a futuristic cyberpunk scenario to simulate a cybercrime investigation. Detective Y investigates a complex ransomware attack targeting robotics engineer Z by "The Serpent," who employs advanced techniques to encrypt and steal research data. This scenario serves as the knowledge base for the RAG system.
## Story based Questions
The RAG system answers questions based on the provided cyber forensics scenario. Examples:
**In-Text Questions:**
1. What type of cyberattack did Detective Y investigate?
2. What was the victim's profession?
3. Where was the remote server located that led to the perpetrator's arrest?
**Out-of-Text Questions (Answers not in the text):**
1. What specific encryption algorithm did The Serpent use?
2. What was the name of the university where the security breach occurred?
3. Did Detective Y's team collaborate with external experts?
## Features ## Features