add CKIM2024 cyber incident case study

This commit is contained in:
Frank Xu
2024-07-23 10:20:29 -04:00
parent 85386310f9
commit 7fa7f44e92
17 changed files with 1908 additions and 370 deletions

View File

@@ -1,26 +1,63 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## A tutorial to identify evidence entities from a cyber incident report\n",
"\n",
"The cyber incident report records a conversation between an IT Security Specialist and an Employee. The conversation describes an email phishing attack scenario.\n",
"\n",
"### Goal\n",
"- Familiar with [DSPy: Declarative Self-improving Language Programs, pythonically](https://github.com/stanfordnlp/dspy). \n",
" - DSPy is a framework for algorithmically optimizing LM prompts and weights.\n",
" - The framework for programming—not prompting—foundation models\n",
"- Identify a list of evidence entities\n",
"- Identify a list of relationships between entities"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Step 1: Download libraries and files for the lab\n",
"- Make use you download necessary library and files. \n",
"- All downloaded and saved files can be located in the `content` folder if using google Colab"
]
},
{
"cell_type": "code",
"execution_count": 12,
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"# uncomment the commands to download libraries and files\n",
"#!pip install python-dotenv\n",
"#!pip install dspy-ai\n",
"#!pip install graphviz\n",
"# !wget https://raw.githubusercontent.com/frankwxu/digital-forensics-lab/main/AI4Forensics/CKIM2024/PhishingAttack/PhishingAttackScenarioDemo/conversation.txt\n",
"\n",
"import dspy\n",
"import os\n",
"import openai\n",
"import json\n",
"from dotenv import load_dotenv\n",
"\n",
"from graphviz import Digraph\n",
"from IPython.display import display"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Step 2: Config DSPy with openAI \n",
"- You `MUST` have an openAI api key\n",
"- load an openAI api key from `openai_api_key.txt` file\n",
"- or, hard code your open api key"
]
},
{
"cell_type": "code",
"execution_count": 13,
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
@@ -51,40 +88,22 @@
" dspy.settings.configure(lm=turbo)\n",
" return turbo\n",
"\n",
"# provide `openai_api_key.txt` with your openAI api key\n",
"turbo=set_dspy()\n",
"# comment out set_dspy() and use set_dspy_hardcode_openai_key is your option\n",
"# optionally, hard code your openAI api key at line 21 \n",
"# turbo=set_dspy_hardcode_openai_key()"
]
},
{
"cell_type": "code",
"execution_count": 14,
"cell_type": "markdown",
"metadata": {},
"outputs": [],
"source": [
"def load_text_file(file_path):\n",
" \"\"\"\n",
" Load a text file and return its contents as a string.\n",
"\n",
" Parameters:\n",
" file_path (str): The path to the text file.\n",
"\n",
" Returns:\n",
" str: The contents of the text file.\n",
" \"\"\"\n",
" try:\n",
" with open(file_path, \"r\") as file:\n",
" contents = file.read()\n",
" return contents\n",
" except FileNotFoundError:\n",
" return \"File not found.\"\n",
" except Exception as e:\n",
" return f\"An error occurred: {e}\"\n"
"### Step 3: Load the cyber incident repot (e.g., conversation)"
]
},
{
"cell_type": "code",
"execution_count": 15,
"execution_count": 4,
"metadata": {},
"outputs": [
{
@@ -134,30 +153,101 @@
}
],
"source": [
"conversation=load_text_file(\"conversation.txt\")\n",
"def load_text_file(file_path):\n",
" \"\"\"\n",
" Load a text file and return its contents as a string.\n",
"\n",
" Parameters:\n",
" file_path (str): The path to the text file.\n",
"\n",
" Returns:\n",
" str: The contents of the text file.\n",
" \"\"\"\n",
" try:\n",
" with open(file_path, \"r\") as file:\n",
" contents = file.read()\n",
" return contents\n",
" except FileNotFoundError:\n",
" return \"File not found.\"\n",
" except Exception as e:\n",
" return f\"An error occurred: {e}\"\n",
"\n",
"conversation = load_text_file(\"conversation.txt\")\n",
"print(conversation)"
]
},
{
"cell_type": "code",
"execution_count": 16,
"cell_type": "markdown",
"metadata": {},
"outputs": [],
"source": [
"class EvidenceIdentifier(dspy.Signature):\n",
" \"\"\"Idenitfy evidence entities from a conversation between -Alex (IT Security Specialist) and Taylor (Employee).\"\"\"\n",
"### Step 4: Tell an LLM `WHAT` are the inputs/outputs by defining DSPy: Signature \n",
"\n",
" question = dspy.InputField(\n",
" desc=\"a conversation between -Alex (IT Security Specialist) and Bob (Employee).\"\n",
" )\n",
" answer = dspy.OutputField(\n",
" desc=\"a list of evidence, inlcuding but not limited to emaile, IP address, URL, File name, timestamps, etc, in the conversation as a Python dictionary. For example, {evidence type: evidence value, ...}\"\n",
" )"
"- A signature is one of the basic building blocks in DSPy's prompt programming\n",
"- It is a declarative specification of input/output behavior of a DSPy module\n",
" - Think about a function signature\n",
"- Allow you to tell the LLM what it needs to do. \n",
" - Don't need to specify how we should ask the LLM to do it.\n",
"- The following signature identifies a list of evidence based on the conversation\n",
" - Inherit from `dspy.Signature`\n",
" - Exact `ONE` input, e.g., the conversation \n",
" - Exact `ONE` output, e.g., a list of evidence entities\n",
"\n",
"### The following `EvidenceIdentifier` is equivalent to \n",
"\n",
"```\n",
"Identify evidence entities from a conversation ....\n",
"---\n",
"Follow the following format.\n",
"Question: a conversation between an IT Security Specialist and Employe\n",
"Answer: a list of evidence, inlcuding ...\n",
"---\n",
"Question: {a new unseen conversation}\n",
"Answer: write your answer here\n",
"```\n"
]
},
{
"cell_type": "code",
"execution_count": 17,
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"class EvidenceIdentifier(dspy.Signature):\n",
" \"\"\"Identify evidence entities from a conversation between an IT Security Specialist and an Employee.\"\"\"\n",
"\n",
" question = dspy.InputField(\n",
" desc=\"a conversation between an IT Security Specialist and Employee.\"\n",
" )\n",
" answer = dspy.OutputField(\n",
" desc=\"a list of evidence, inlcuding but not limited to emails, IP addresses, URLs, File names, timestamps, etc, in the conversation as a Python dictionary. For example, {evidence type: evidence value, ...}\"\n",
" )"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Step 5: Tell an LLM `HOW` to generate answer in a function: \n",
"\n",
"Generates and saves evidence from a conversation using a specified signature.\n",
"\n",
"#### Parameters:\n",
"- `signature` (dspy.Signature): The signature defining the input and output structure for evidence identification.\n",
"- `conversation` (str): The conversation text to analyze for evidence.\n",
"- `output_file` (str): The file path where the identified evidence will be saved as JSON.\n",
"\n",
"#### Returns:\n",
"None. The function saves the result to a file and prints a confirmation message.\n",
"\n",
"#### Notes:\n",
"- This function uses `dspy.Predict` to process the conversation and identify evidence.\n",
"- The result is saved as a formatted JSON file.\n",
"- The function prints the result to the console and saves it to the specified file."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
@@ -172,17 +262,27 @@
" print(f\"The evidence has been saved to the file {output_file}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Step 6: Execute above function and generate entities with three inputs\n",
"- Which signature: `EvidenceIdentifier`\n",
"- What input: conversation\n",
"- Where to save results: the name of output file"
]
},
{
"cell_type": "code",
"execution_count": 18,
"execution_count": 7,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{'Email From': 'support@banksecure.com', 'Email Subject': 'Urgent: Verify Your Account Now', 'IP Address': '192.168.10.45', 'Domain': 'banksecure.com', 'Actual Domain Registration': 'Russia', 'URL Clicked': 'http://banksecure-verification.com/login', 'URL Visited 1': 'http://banksecure-verification.com/login', 'URL Visited 2': 'http://banksecure-verification.com/account-details', 'File Downloaded': 'AccountDetails.exe', 'File Creation Time': '10:20 AM', 'MD5 Hash': 'e99a18c428cb38d5f260853678922e03', 'Network Logs Timestamp': '10:20 AM'}\n",
"The evidence has been saved to the file 01_output_evidence_entity.txt\n"
"{'Email': {'From': 'support@banksecure.com', 'Subject': 'Urgent: Verify Your Account Now', 'Content': 'strange email asking to verify account details urgently'}, 'IP Address': '192.168.10.45', 'Domain': 'banksecure.com', 'URLs': ['http://banksecure-verification.com/login', 'http://banksecure-verification.com/account-details'], 'File': {'Name': 'AccountDetails.exe', 'Creation Time': '10:20 AM', 'MD5 Hash': 'e99a18c428cb38d5f260853678922e03'}, 'Timestamps': {'Visited at 10:15 AM': 'http://banksecure-verification.com/login', 'Visited at 10:17 AM': 'http://banksecure-verification.com/account-details'}}\n",
"The evidence has been saved to the file 01_output_entity.txt\n"
]
}
],
@@ -194,9 +294,21 @@
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Step 7: Inspect the last prompt send to the LLM\n",
"\n",
"You want to check:\n",
"- Prompt Description Section: Description in the signature\n",
"- Format Section: `Following the following format.` \n",
"- Result Section: Question (scenario) and Answer (entities) section"
]
},
{
"cell_type": "code",
"execution_count": 19,
"execution_count": 8,
"metadata": {},
"outputs": [
{
@@ -206,31 +318,39 @@
"\n",
"\n",
"\n",
"Idenitfy evidence entities from a conversation between -Alex (IT Security Specialist) and Taylor (Employee).\n",
"Identify evidence entities from a conversation between an IT Security Specialist and an Employee.\n",
"\n",
"---\n",
"\n",
"Follow the following format.\n",
"\n",
"Question: a conversation between -Alex (IT Security Specialist) and Bob (Employee).\n",
"Question: a conversation between an IT Security Specialist and Employee.\n",
"Answer: a list of evidence, inlcuding but not limited to emaile, IP address, URL, File name, timestamps, etc, in the conversation as a Python dictionary. For example, {evidence type: evidence value, ...}\n",
"\n",
"---\n",
"\n",
"Question: Alice: Hey Bob, I just got a strange email from support@banksecure.com. It says I need to verify my account details urgently. The subject line was \"Urgent: Verify Your Account Now\". The email looks suspicious to me. Bob: Hi Alice, that does sound fishy. Can you forward me the email? Ill take a look at the headers to see where it came from. Alice: Sure, forwarding it now. Bob: Got it. Lets see... The email came from IP address 192.168.10.45, but the domain banksecure.com is not their official domain. It's actually registered to someone in Russia. Alice: Thats definitely not right. Should I be worried? Bob: We should investigate further. Did you click on any links or download any attachments? Alice: I did click on a link that took me to a page asking for my login credentials. I didn't enter anything though. The URL was http://banksecure-verification.com/login. Bob: Good call on not entering your details. Lets check the URL. This domain was just registered two days ago. Its highly likely its a phishing site. Alice: What should I do next? Bob: First, clear your browser history and cache. Also, run a full antivirus scan on your computer. Can you also provide me with any browser history entries and cookies from that session? Alice: Ive cleared the history and started the antivirus scan. Here are the relevant entries from my browser history: Visited at 10:15 AM: http://banksecure-verification.com/login Visited at 10:17 AM: http://banksecure-verification.com/account-details Bob: Thanks. Ill analyze these URLs further. Also, check if there are any suspicious files downloaded or present in your downloads folder. Look for anything unusual. Alice: There's a file named \"AccountDetails.exe\" that I dont remember downloading. It was created at 10:20 AM. Bob: Definitely suspicious. Dont open it. Lets hash the file to verify its integrity. Can you run an MD5 hash on it? Alice: Done. The MD5 hash is e99a18c428cb38d5f260853678922e03. Bob: This hash matches known malware in our database. Well need to quarantine it and check if it has established any network connections. Ill look into our network logs for the IP 192.168.10.45 around 10:20 AM. Alice: Is there anything else I need to do? Bob: For now, avoid using your computer for sensitive tasks. Well also reset your passwords from a different device and enable two-factor authentication on your accounts. Alice: Thanks, Bob. Ill follow these steps immediately.\n",
"Answer: {\n",
" \"Email From\": \"support@banksecure.com\",\n",
" \"Email Subject\": \"Urgent: Verify Your Account Now\",\n",
" \"Email\": {\n",
" \"From\": \"support@banksecure.com\",\n",
" \"Subject\": \"Urgent: Verify Your Account Now\",\n",
" \"Content\": \"strange email asking to verify account details urgently\"\n",
" },\n",
" \"IP Address\": \"192.168.10.45\",\n",
" \"Domain\": \"banksecure.com\",\n",
" \"Actual Domain Registration\": \"Russia\",\n",
" \"URL Clicked\": \"http://banksecure-verification.com/login\",\n",
" \"URL Visited 1\": \"http://banksecure-verification.com/login\",\n",
" \"URL Visited 2\": \"http://banksecure-verification.com/account-details\",\n",
" \"File Downloaded\": \"AccountDetails.exe\",\n",
" \"File Creation Time\": \"10:20 AM\",\n",
" \"MD5 Hash\": \"e99a18c428cb38d5f260853678922e03\",\n",
" \"Network Logs Timestamp\": \"10:20 AM\"\n",
" \"URLs\": [\n",
" \"http://banksecure-verification.com/login\",\n",
" \"http://banksecure-verification.com/account-details\"\n",
" ],\n",
" \"File\": {\n",
" \"Name\": \"AccountDetails.exe\",\n",
" \"Creation Time\": \"10:20 AM\",\n",
" \"MD5 Hash\": \"e99a18c428cb38d5f260853678922e03\"\n",
" },\n",
" \"Timestamps\": {\n",
" \"Visited at 10:15 AM\": \"http://banksecure-verification.com/login\",\n",
" \"Visited at 10:17 AM\": \"http://banksecure-verification.com/account-details\"\n",
" }\n",
"}\n",
"\n",
"\n",
@@ -240,10 +360,10 @@
{
"data": {
"text/plain": [
"'\\n\\n\\nIdenitfy evidence entities from a conversation between -Alex (IT Security Specialist) and Taylor (Employee).\\n\\n---\\n\\nFollow the following format.\\n\\nQuestion: a conversation between -Alex (IT Security Specialist) and Bob (Employee).\\nAnswer: a list of evidence, inlcuding but not limited to emaile, IP address, URL, File name, timestamps, etc, in the conversation as a Python dictionary. For example, {evidence type: evidence value, ...}\\n\\n---\\n\\nQuestion: Alice: Hey Bob, I just got a strange email from support@banksecure.com. It says I need to verify my account details urgently. The subject line was \"Urgent: Verify Your Account Now\". The email looks suspicious to me. Bob: Hi Alice, that does sound fishy. Can you forward me the email? Ill take a look at the headers to see where it came from. Alice: Sure, forwarding it now. Bob: Got it. Lets see... The email came from IP address 192.168.10.45, but the domain banksecure.com is not their official domain. It\\'s actually registered to someone in Russia. Alice: Thats definitely not right. Should I be worried? Bob: We should investigate further. Did you click on any links or download any attachments? Alice: I did click on a link that took me to a page asking for my login credentials. I didn\\'t enter anything though. The URL was http://banksecure-verification.com/login. Bob: Good call on not entering your details. Lets check the URL. This domain was just registered two days ago. Its highly likely its a phishing site. Alice: What should I do next? Bob: First, clear your browser history and cache. Also, run a full antivirus scan on your computer. Can you also provide me with any browser history entries and cookies from that session? Alice: Ive cleared the history and started the antivirus scan. Here are the relevant entries from my browser history: Visited at 10:15 AM: http://banksecure-verification.com/login Visited at 10:17 AM: http://banksecure-verification.com/account-details Bob: Thanks. Ill analyze these URLs further. Also, check if there are any suspicious files downloaded or present in your downloads folder. Look for anything unusual. Alice: There\\'s a file named \"AccountDetails.exe\" that I dont remember downloading. It was created at 10:20 AM. Bob: Definitely suspicious. Dont open it. Lets hash the file to verify its integrity. Can you run an MD5 hash on it? Alice: Done. The MD5 hash is e99a18c428cb38d5f260853678922e03. Bob: This hash matches known malware in our database. Well need to quarantine it and check if it has established any network connections. Ill look into our network logs for the IP 192.168.10.45 around 10:20 AM. Alice: Is there anything else I need to do? Bob: For now, avoid using your computer for sensitive tasks. Well also reset your passwords from a different device and enable two-factor authentication on your accounts. Alice: Thanks, Bob. Ill follow these steps immediately.\\nAnswer:\\x1b[32m {\\n \"Email From\": \"support@banksecure.com\",\\n \"Email Subject\": \"Urgent: Verify Your Account Now\",\\n \"IP Address\": \"192.168.10.45\",\\n \"Domain\": \"banksecure.com\",\\n \"Actual Domain Registration\": \"Russia\",\\n \"URL Clicked\": \"http://banksecure-verification.com/login\",\\n \"URL Visited 1\": \"http://banksecure-verification.com/login\",\\n \"URL Visited 2\": \"http://banksecure-verification.com/account-details\",\\n \"File Downloaded\": \"AccountDetails.exe\",\\n \"File Creation Time\": \"10:20 AM\",\\n \"MD5 Hash\": \"e99a18c428cb38d5f260853678922e03\",\\n \"Network Logs Timestamp\": \"10:20 AM\"\\n}\\x1b[0m\\n\\n\\n'"
"'\\n\\n\\nIdentify evidence entities from a conversation between an IT Security Specialist and an Employee.\\n\\n---\\n\\nFollow the following format.\\n\\nQuestion: a conversation between an IT Security Specialist and Employee.\\nAnswer: a list of evidence, inlcuding but not limited to emaile, IP address, URL, File name, timestamps, etc, in the conversation as a Python dictionary. For example, {evidence type: evidence value, ...}\\n\\n---\\n\\nQuestion: Alice: Hey Bob, I just got a strange email from support@banksecure.com. It says I need to verify my account details urgently. The subject line was \"Urgent: Verify Your Account Now\". The email looks suspicious to me. Bob: Hi Alice, that does sound fishy. Can you forward me the email? Ill take a look at the headers to see where it came from. Alice: Sure, forwarding it now. Bob: Got it. Lets see... The email came from IP address 192.168.10.45, but the domain banksecure.com is not their official domain. It\\'s actually registered to someone in Russia. Alice: Thats definitely not right. Should I be worried? Bob: We should investigate further. Did you click on any links or download any attachments? Alice: I did click on a link that took me to a page asking for my login credentials. I didn\\'t enter anything though. The URL was http://banksecure-verification.com/login. Bob: Good call on not entering your details. Lets check the URL. This domain was just registered two days ago. Its highly likely its a phishing site. Alice: What should I do next? Bob: First, clear your browser history and cache. Also, run a full antivirus scan on your computer. Can you also provide me with any browser history entries and cookies from that session? Alice: Ive cleared the history and started the antivirus scan. Here are the relevant entries from my browser history: Visited at 10:15 AM: http://banksecure-verification.com/login Visited at 10:17 AM: http://banksecure-verification.com/account-details Bob: Thanks. Ill analyze these URLs further. Also, check if there are any suspicious files downloaded or present in your downloads folder. Look for anything unusual. Alice: There\\'s a file named \"AccountDetails.exe\" that I dont remember downloading. It was created at 10:20 AM. Bob: Definitely suspicious. Dont open it. Lets hash the file to verify its integrity. Can you run an MD5 hash on it? Alice: Done. The MD5 hash is e99a18c428cb38d5f260853678922e03. Bob: This hash matches known malware in our database. Well need to quarantine it and check if it has established any network connections. Ill look into our network logs for the IP 192.168.10.45 around 10:20 AM. Alice: Is there anything else I need to do? Bob: For now, avoid using your computer for sensitive tasks. Well also reset your passwords from a different device and enable two-factor authentication on your accounts. Alice: Thanks, Bob. Ill follow these steps immediately.\\nAnswer:\\x1b[32m {\\n \"Email\": {\\n \"From\": \"support@banksecure.com\",\\n \"Subject\": \"Urgent: Verify Your Account Now\",\\n \"Content\": \"strange email asking to verify account details urgently\"\\n },\\n \"IP Address\": \"192.168.10.45\",\\n \"Domain\": \"banksecure.com\",\\n \"URLs\": [\\n \"http://banksecure-verification.com/login\",\\n \"http://banksecure-verification.com/account-details\"\\n ],\\n \"File\": {\\n \"Name\": \"AccountDetails.exe\",\\n \"Creation Time\": \"10:20 AM\",\\n \"MD5 Hash\": \"e99a18c428cb38d5f260853678922e03\"\\n },\\n \"Timestamps\": {\\n \"Visited at 10:15 AM\": \"http://banksecure-verification.com/login\",\\n \"Visited at 10:17 AM\": \"http://banksecure-verification.com/account-details\"\\n }\\n}\\x1b[0m\\n\\n\\n'"
]
},
"execution_count": 19,
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
@@ -252,9 +372,35 @@
"turbo.inspect_history(n=1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## A tutorial to identify `evidence relationship` from a cyber incident report\n",
"\n",
"The cyber incident report records a conversation between an IT Security Specialist and an Employee. The conversation describes an email phishing attack scenario.\n",
"\n",
"### Goal\n",
"- In addition to a list of evidence entities, we want to identify a list of `relationships` between entities"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Step 1: Define a signature that identifies a list of `relationships` in the conversation\n",
"\n",
"It is important to note that:\n",
"- There is ONE input \n",
" - Cyber incident conversation\n",
"- There are `TWO` outputs:\n",
" - a list of entities\n",
" - a list of relationships"
]
},
{
"cell_type": "code",
"execution_count": 20,
"execution_count": 9,
"metadata": {},
"outputs": [],
"source": [
@@ -274,9 +420,20 @@
" )"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Step 2: A function that can receive two outputs\n",
"\n",
"We have to revise the function `generate_answer()` so that we can receive two outputs. The following function `generate_answers` can receive two outputs from a LLM (e.g, openAI)\n",
"- a list of entities\n",
"- a list of relationships"
]
},
{
"cell_type": "code",
"execution_count": 21,
"execution_count": 10,
"metadata": {},
"outputs": [],
"source": [
@@ -306,9 +463,20 @@
" print(f\"The evidence has been saved to the file {output_file}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Step 3: Execute code to generate evidences and relations\n",
"- Input 1: Signature: E`videnceRelationIdentifier`\n",
"- Input 2: a conversation\n",
"- Output 1: a file that saves entities and relations\n",
"- Output 2: a list of entities and relations"
]
},
{
"cell_type": "code",
"execution_count": 22,
"execution_count": 11,
"metadata": {},
"outputs": [
{
@@ -319,7 +487,7 @@
" answer_relations='{\\n \"Email Header Analysis\": \"IP Address -> Domain\",\\n \"URL Analysis\": \"URL -> Domain\",\\n \"Browser History Analysis\": \"URL -> Timestamp\",\\n \"File Analysis\": \"File Name -> Timestamp, File Name -> MD5 Hash\",\\n \"Malware Analysis\": \"MD5 Hash -> Malware Database\"\\n}',\n",
" answer_evidence='{\\n \"Email Sender\": \"support@banksecure.com\",\\n \"Email Subject\": \"Urgent: Verify Your Account Now\",\\n \"IP Address\": \"192.168.10.45\",\\n \"Domain\": \"banksecure.com\",\\n \"Domain Registration\": \"Russia\",\\n \"URL\": \"http://banksecure-verification.com/login\",\\n \"URL Registration Date\": \"Two days ago\",\\n \"File Name\": \"AccountDetails.exe\",\\n \"File Creation Timestamp\": \"10:20 AM\",\\n \"MD5 Hash\": \"e99a18c428cb38d5f260853678922e03\"\\n}'\n",
")\n",
"The evidence has been saved to the file 01_output_evidence_entity_relation.txt\n"
"The evidence has been saved to the file 01_output_entity_relation.txt\n"
]
}
],
@@ -338,7 +506,9 @@
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
"source": [
"turbo.inspect_history(n=1)"
]
}
],
"metadata": {

View File

@@ -1,14 +1,22 @@
{
"Email From": "support@banksecure.com",
"Email Subject": "Urgent: Verify Your Account Now",
"Email": {
"From": "support@banksecure.com",
"Subject": "Urgent: Verify Your Account Now",
"Content": "strange email asking to verify account details urgently"
},
"IP Address": "192.168.10.45",
"Domain": "banksecure.com",
"Actual Domain Registration": "Russia",
"URL Clicked": "http://banksecure-verification.com/login",
"URL Visited 1": "http://banksecure-verification.com/login",
"URL Visited 2": "http://banksecure-verification.com/account-details",
"File Downloaded": "AccountDetails.exe",
"File Creation Time": "10:20 AM",
"MD5 Hash": "e99a18c428cb38d5f260853678922e03",
"Network Logs Timestamp": "10:20 AM"
"URLs": [
"http://banksecure-verification.com/login",
"http://banksecure-verification.com/account-details"
],
"File": {
"Name": "AccountDetails.exe",
"Creation Time": "10:20 AM",
"MD5 Hash": "e99a18c428cb38d5f260853678922e03"
},
"Timestamps": {
"Visited at 10:15 AM": "http://banksecure-verification.com/login",
"Visited at 10:17 AM": "http://banksecure-verification.com/account-details"
}
}

View File

@@ -1,5 +1,17 @@
digraph file_not_found {
File [label="File" shape="rectangle" color="blue"]
NotFound [label="Not Found" shape="ellipse" color="red"]
File -> NotFound [label="Indicator"]
digraph phishing_incident {
"Email from support@banksecure.com" -> "IP address 192.168.10.45";
"Email from support@banksecure.com" -> "Domain banksecure.com";
"Domain banksecure.com" -> "Registered to someone in Russia";
"URL http://banksecure-verification.com/login" -> "Domain registered two days ago";
"URL http://banksecure-verification.com/account-details" -> "Domain registered two days ago";
"Browser history entries" -> "Visited at 10:15 AM: http://banksecure-verification.com/login";
"Browser history entries" -> "Visited at 10:17 AM: http://banksecure-verification.com/account-details";
"Downloaded file AccountDetails.exe" -> "Created at 10:20 AM";
"Downloaded file AccountDetails.exe" -> "MD5 hash e99a18c428cb38d5f260853678922e03";
"MD5 hash e99a18c428cb38d5f260853678922e03" -> "Matches known malware in database";
"IP address 192.168.10.45" -> "Network connections established";
"Security measures" -> "Clear browser history and cache";
"Security measures" -> "Run full antivirus scan";
"Security measures" -> "Reset passwords from different device";
"Security measures" -> "Enable two-factor authentication";
}

Binary file not shown.

Before

Width:  |  Height:  |  Size: 4.6 KiB

After

Width:  |  Height:  |  Size: 114 KiB

View File

@@ -1,12 +1,75 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## A tutorial to generate evidence in a standard and structured format. \n",
"\n",
"### Benefits of Using Standardized formats for forensics evidence\n",
"- Consistency: easier to compare and analyze different pieces of evidence\n",
"- Interoperability: exchange of evidence across different systems and platforms\n",
"- Accuracy: reduces the risk of errors and omissions\n",
"- Automation: facilitate the use of automated tools and technologies, such as machine learning algorithms, for evidence analysis.\n",
"\n",
"### Solution: Structured Threat Information eXpression (STIX)\n",
"- Share information about cyber threats\n",
" - think of it as a common language that everyone in the cybersecurity community can use to communicate effectively\n",
" - improve their threat intelligence capabilities\n",
"- Include basic predefined objects can be used as `digital forensics evidence`\n",
" - email, URL, indentity, etc.\n",
"- Community support: maintained by the Organization for the Advancement of Structured Information Standards (OASIS)\n",
" - open sourced\n",
" - tools and library support\n",
"- Adaptability: flexible and can be extended to accommodate new types of threat information as the cybersecurity landscape evolves.\n",
"\n",
"### Example of `email-message` \n",
"```\n",
"in STIX\n",
" {\n",
" \"type\": \"email-message\",\n",
" \"id\": \"email-message--c79b6bde-4f4c-4b38-a8c8-fb82921d6b97\",\n",
" \"is_multipart\": false,\n",
" \"subject\": \"Urgent Benefits Package Update\",\n",
" \"from_ref\": \"email-addr--0c0d2094-df97-45a7-9e9c-223569a9e798\",\n",
" \"body\": \"Please click the link to review the changes to your benefits package.\"\n",
" }\n",
"\n",
" vs.\n",
"without STIX\n",
"\n",
" \"Email\": {\n",
" \"From\": \"support@banksecure.com\",\n",
" \"Subject\": \"Urgent: Verify Your Account Now\",\n",
" \"Content\": \"strange email asking to verify account details urgently\"\n",
" }\n",
"```\n",
"\n",
"### Goal\n",
"- Capture threat information in STIX directly from the conversation\n",
"- Evidence entities and/or relationships are in the STIX"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Step 1: Download libraries and files for the lab\n",
"- Make use you download necessary library and files. \n",
"- All downloaded and saved files can be located in the `content` folder if using google Colab"
]
},
{
"cell_type": "code",
"execution_count": 1,
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"# uncomment the commands to download libraries and files\n",
"#!pip install python-dotenv\n",
"#!pip install dspy-ai\n",
"#!pip install graphviz\n",
"# !wget https://raw.githubusercontent.com/frankwxu/digital-forensics-lab/main/AI4Forensics/CKIM2024/PhishingAttack/PhishingAttackScenarioDemo/conversation.txt\n",
"\n",
"import dspy\n",
"import os\n",
@@ -16,9 +79,19 @@
"from IPython.display import display"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Step 2: Config DSPy with openAI \n",
"- You `MUST` have an openAI api key\n",
"- load an openAI api key from `openai_api_key.txt` file\n",
"- or, hard code your open api key"
]
},
{
"cell_type": "code",
"execution_count": 2,
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
@@ -41,48 +114,29 @@
"\n",
"\n",
"def set_dspy_hardcode_openai_key():\n",
" os.environ[\"OPENAI_API_KEY\"] = (\n",
" \"sk-proj-yourapikeyhere\"\n",
" )\n",
" os.environ[\"OPENAI_API_KEY\"] = \"sk-proj-yourapikeyhere\"\n",
" openai.api_key = os.environ[\"OPENAI_API_KEY\"]\n",
" turbo = dspy.OpenAI(model=\"gpt-3.5-turbo\", temperature=0, max_tokens=2000)\n",
" turbo = dspy.OpenAI(model=\"gpt-3.5-turbo\", temperature=0, max_tokens=2000)\n",
" dspy.settings.configure(lm=turbo)\n",
" return turbo\n",
"\n",
"turbo=set_dspy()\n",
"# comment out set_dspy() and use set_dspy_hardcode_openai_key is your option\n",
"\n",
"# provide `openai_api_key.txt` with your openAI api key\n",
"turbo = set_dspy()\n",
"# optionally, hard code your openAI api key at line 21\n",
"# turbo=set_dspy_hardcode_openai_key()"
]
},
{
"cell_type": "code",
"execution_count": 3,
"cell_type": "markdown",
"metadata": {},
"outputs": [],
"source": [
"def load_text_file(file_path):\n",
" \"\"\"\n",
" Load a text file and return its contents as a string.\n",
"\n",
" Parameters:\n",
" file_path (str): The path to the text file.\n",
"\n",
" Returns:\n",
" str: The contents of the text file.\n",
" \"\"\"\n",
" try:\n",
" with open(file_path, \"r\") as file:\n",
" contents = file.read()\n",
" return contents\n",
" except FileNotFoundError:\n",
" return \"File not found.\"\n",
" except Exception as e:\n",
" return f\"An error occurred: {e}\"\n"
"### Step 3: Load the cyber incident repot (e.g., conversation)"
]
},
{
"cell_type": "code",
"execution_count": 4,
"execution_count": 6,
"metadata": {},
"outputs": [
{
@@ -132,17 +186,54 @@
}
],
"source": [
"conversation=load_text_file(\"conversation.txt\")\n",
"def load_text_file(file_path):\n",
" \"\"\"\n",
" Load a text file and return its contents as a string.\n",
"\n",
" Parameters:\n",
" file_path (str): The path to the text file.\n",
"\n",
" Returns:\n",
" str: The contents of the text file.\n",
" \"\"\"\n",
" try:\n",
" with open(file_path, \"r\") as file:\n",
" contents = file.read()\n",
" return contents\n",
" except FileNotFoundError:\n",
" return \"File not found.\"\n",
" except Exception as e:\n",
" return f\"An error occurred: {e}\"\n",
"\n",
"\n",
"conversation = load_text_file(\"conversation.txt\")\n",
"print(conversation)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Step 4: Tell an LLM `WHAT` are the inputs/outputs by defining DSPy: Signature \n",
"\n",
"- A signature is one of the basic building blocks in DSPy's prompt programming\n",
"- It is a declarative specification of input/output behavior of a DSPy module\n",
" - Think about a function signature\n",
"- Allow you to tell the LLM what it needs to do. \n",
" - Don't need to specify how we should ask the LLM to do it.\n",
"- The following signature identifies a list of evidence based on the conversation\n",
" - Inherit from `dspy.Signature`\n",
" - Exact `ONE` input, e.g., the conversation \n",
" - Exact `ONE` output, e.g., cyber threat information in JSON"
]
},
{
"cell_type": "code",
"execution_count": 5,
"execution_count": 7,
"metadata": {},
"outputs": [],
"source": [
"class SITXGenerator(dspy.Signature):\n",
"class STIXGenerator(dspy.Signature):\n",
" \"\"\"Describe a conversation in STIX, which stands for Structured Threat Information eXpression, is a standardized language for representing cyber threat information.\"\"\"\n",
"\n",
" question: str = dspy.InputField(\n",
@@ -154,9 +245,26 @@
" )"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Step 5: Tell an LLM `HOW` to generate answer: \n",
"\n",
"The following function generates and saves threat information from a conversation using a specified signature.\n",
"\n",
"#### Parameters:\n",
"- `signature` (dspy.Signature): The signature defining the input and output structure for evidence identification.\n",
"- `conversation` (str): The conversation text to analyze for threat information.\n",
"- `output_file` (str): The file path where the identified threat information will be saved as JSON.\n",
"\n",
"#### Returns:\n",
"None. The function saves the result to a file and prints a confirmation message."
]
},
{
"cell_type": "code",
"execution_count": 6,
"execution_count": 8,
"metadata": {},
"outputs": [],
"source": [
@@ -171,9 +279,16 @@
" print(f\"The evidence has been saved to the file {output_file}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Step 6: Generate entities using `STIXGenerator`"
]
},
{
"cell_type": "code",
"execution_count": 7,
"execution_count": 9,
"metadata": {},
"outputs": [
{
@@ -228,15 +343,28 @@
"source": [
"output_file = \"03_output.json\"\n",
"generate_answer_CoT(\n",
" SITXGenerator,\n",
" STIXGenerator,\n",
" conversation,\n",
" output_file,\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Step 7: Inspect the last prompt send to the LLM\n",
"\n",
"You want to check:\n",
"- Prompt Description Section: Description in the signature\n",
"- Format Section: `Following the following format.` \n",
" - Pay attention to a new inserted field `REASONING: Let's think step by step ...`\n",
"- Result Section: a threat information in `.JSON`"
]
},
{
"cell_type": "code",
"execution_count": 8,
"execution_count": 10,
"metadata": {},
"outputs": [
{
@@ -314,7 +442,7 @@
"'\\n\\n\\nDescribe a conversation in STIX, which stands for Structured Threat Information eXpression, is a standardized language for representing cyber threat information.\\n\\n---\\n\\nFollow the following format.\\n\\nQuestion: a conversation describing a cyber incident between an IT Security Specialist and an employee.\\nReasoning: Let\\'s think step by step in order to ${produce the answer}. We ...\\nAnswer: the formalized STIX in JSON representing cyber threat information based on the conversation, e.g., [{object 1}, {object 2}, ... {object n}]\\n\\n---\\n\\nQuestion: Alice: Hey Bob, I just got a strange email from support@banksecure.com. It says I need to verify my account details urgently. The subject line was \"Urgent: Verify Your Account Now\". The email looks suspicious to me. Bob: Hi Alice, that does sound fishy. Can you forward me the email? Ill take a look at the headers to see where it came from. Alice: Sure, forwarding it now. Bob: Got it. Lets see... The email came from IP address 192.168.10.45, but the domain banksecure.com is not their official domain. It\\'s actually registered to someone in Russia. Alice: Thats definitely not right. Should I be worried? Bob: We should investigate further. Did you click on any links or download any attachments? Alice: I did click on a link that took me to a page asking for my login credentials. I didn\\'t enter anything though. The URL was http://banksecure-verification.com/login. Bob: Good call on not entering your details. Lets check the URL. This domain was just registered two days ago. Its highly likely its a phishing site. Alice: What should I do next? Bob: First, clear your browser history and cache. Also, run a full antivirus scan on your computer. Can you also provide me with any browser history entries and cookies from that session? Alice: Ive cleared the history and started the antivirus scan. Here are the relevant entries from my browser history: Visited at 10:15 AM: http://banksecure-verification.com/login Visited at 10:17 AM: http://banksecure-verification.com/account-details Bob: Thanks. Ill analyze these URLs further. Also, check if there are any suspicious files downloaded or present in your downloads folder. Look for anything unusual. Alice: There\\'s a file named \"AccountDetails.exe\" that I dont remember downloading. It was created at 10:20 AM. Bob: Definitely suspicious. Dont open it. Lets hash the file to verify its integrity. Can you run an MD5 hash on it? Alice: Done. The MD5 hash is e99a18c428cb38d5f260853678922e03. Bob: This hash matches known malware in our database. Well need to quarantine it and check if it has established any network connections. Ill look into our network logs for the IP 192.168.10.45 around 10:20 AM. Alice: Is there anything else I need to do? Bob: For now, avoid using your computer for sensitive tasks. Well also reset your passwords from a different device and enable two-factor authentication on your accounts. Alice: Thanks, Bob. Ill follow these steps immediately.\\nReasoning: Let\\'s think step by step in order to\\x1b[32m produce the answer. We need to formalize the conversation into STIX objects that represent the cyber threat information discussed between Alice and Bob.\\n\\nAnswer: \\n[\\n {\\n \"email\": {\\n \"sender\": \"support@banksecure.com\",\\n \"subject\": \"Urgent: Verify Your Account Now\",\\n \"headers\": {\\n \"IP_address\": \"192.168.10.45\",\\n \"domain\": \"banksecure.com\",\\n \"registered_to\": \"Russia\"\\n },\\n \"links_clicked\": [\\n {\\n \"URL\": \"http://banksecure-verification.com/login\",\\n \"timestamp\": \"10:15 AM\"\\n },\\n {\\n \"URL\": \"http://banksecure-verification.com/account-details\",\\n \"timestamp\": \"10:17 AM\"\\n }\\n ],\\n \"attachments\": [\\n {\\n \"file_name\": \"AccountDetails.exe\",\\n \"created_at\": \"10:20 AM\",\\n \"MD5_hash\": \"e99a18c428cb38d5f260853678922e03\",\\n \"status\": \"known_malware\"\\n }\\n ]\\n }\\n },\\n {\\n \"actions_taken\": [\\n \"Clear browser history and cache\",\\n \"Run full antivirus scan\",\\n \"Provide browser history entries and cookies\",\\n \"Quarantine suspicious file\",\\n \"Check network connections\",\\n \"Reset passwords and enable two-factor authentication\"\\n ]\\n }\\n]\\x1b[0m\\n\\n\\n'"
]
},
"execution_count": 8,
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}

View File

@@ -0,0 +1,85 @@
<?xml version="1.0" encoding="UTF-8" standalone="no"?><!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd"><!-- Generated by graphviz version 2.40.1 (20161225.0304)
--><!-- Title: STIXGenCoT Pages: 1 --><svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" width="779pt" height="121pt" viewBox="0.00 0.00 779.34 121.00">
<g id="graph0" class="graph" transform="scale(1 1) rotate(0) translate(4 117)">
<title>STIXGenCoT</title>
<polygon fill="#ffffff" stroke="transparent" points="-4,4 -4,-117 775.3424,-117 775.3424,4 -4,4"/>
<g id="clust1" class="cluster">
<title>cluster_forward</title>
<polygon fill="none" stroke="#000000" stroke-dasharray="5,2" points="8,-8 8,-105 678.4518,-105 678.4518,-8 8,-8"/>
<text text-anchor="middle" x="343.2259" y="-88.4" font-family="Times,serif" font-size="14.00" fill="#000000">forward method</text>
</g>
<!-- STIXGenCoT -->
<g id="node1" class="node">
<title>STIXGenCoT</title>
<polygon fill="#add8e6" stroke="#000000" points="771.3424,-52 717.3424,-52 717.3424,-16 771.3424,-16 771.3424,-52"/>
<text text-anchor="middle" x="744.3424" y="-29.8" font-family="Times,serif" font-size="14.00" fill="#000000">LLM</text>
</g>
<!-- Question -->
<g id="node2" class="node">
<title>Question</title>
<polygon fill="#add8e6" stroke="#000000" points="79.6621,-52 15.7783,-52 15.7783,-16 79.6621,-16 79.6621,-52"/>
<text text-anchor="middle" x="47.7202" y="-29.8" font-family="Times,serif" font-size="14.00" fill="#000000">question</text>
</g>
<!-- Retriever -->
<g id="node3" class="node">
<title>Retriever</title>
<polygon fill="#add8e6" stroke="#000000" points="241.6163,-52 154.3817,-52 154.3817,-16 241.6163,-16 241.6163,-52"/>
<text text-anchor="middle" x="197.999" y="-29.8" font-family="Times,serif" font-size="14.00" fill="#000000">self.retriever</text>
</g>
<!-- Question&#45;&gt;Retriever -->
<g id="edge1" class="edge">
<title>Question-&gt;Retriever</title>
<path fill="none" stroke="#000000" d="M79.5905,-34C98.3339,-34 122.5264,-34 144.1535,-34"/>
<polygon fill="#000000" stroke="#000000" points="144.204,-37.5001 154.204,-34 144.204,-30.5001 144.204,-37.5001"/>
</g>
<!-- Predictor -->
<g id="node5" class="node">
<title>Predictor</title>
<polygon fill="#add8e6" stroke="#000000" points="539.1313,-52 448.6467,-52 448.6467,-16 539.1313,-16 539.1313,-52"/>
<text text-anchor="middle" x="493.889" y="-29.8" font-family="Times,serif" font-size="14.00" fill="#000000">self.predictor</text>
</g>
<!-- Question&#45;&gt;Predictor -->
<g id="edge5" class="edge">
<title>Question-&gt;Predictor</title>
<path fill="none" stroke="#000000" d="M79.5481,-44.2567C100.5597,-50.4774 128.8375,-57.8115 154.4404,-61 251.1721,-73.0467 276.8841,-71.7543 373.768,-61 395.213,-58.6196 418.4229,-53.9183 438.6714,-49.0966"/>
<polygon fill="#000000" stroke="#000000" points="439.7764,-52.4296 448.658,-46.653 438.1127,-45.6301 439.7764,-52.4296"/>
</g>
<!-- Context -->
<g id="node4" class="node">
<title>Context</title>
<polygon fill="#add8e6" stroke="#000000" points="373.8736,-52 316.452,-52 316.452,-16 373.8736,-16 373.8736,-52"/>
<text text-anchor="middle" x="345.1628" y="-29.8" font-family="Times,serif" font-size="14.00" fill="#000000">context</text>
</g>
<!-- Retriever&#45;&gt;Context -->
<g id="edge2" class="edge">
<title>Retriever-&gt;Context</title>
<path fill="none" stroke="#000000" d="M241.7545,-34C262.1511,-34 286.2282,-34 306.0156,-34"/>
<polygon fill="#000000" stroke="#000000" points="306.2225,-37.5001 316.2224,-34 306.2224,-30.5001 306.2225,-37.5001"/>
</g>
<!-- Context&#45;&gt;Predictor -->
<g id="edge3" class="edge">
<title>Context-&gt;Predictor</title>
<path fill="none" stroke="#000000" d="M373.8387,-34C392.129,-34 416.4399,-34 438.3995,-34"/>
<polygon fill="#000000" stroke="#000000" points="438.6226,-37.5001 448.6226,-34 438.6225,-30.5001 438.6226,-37.5001"/>
</g>
<!-- Results -->
<g id="node6" class="node">
<title>Results</title>
<polygon fill="#add8e6" stroke="#000000" points="670.6744,-52 613.7874,-52 613.7874,-16 670.6744,-16 670.6744,-52"/>
<text text-anchor="middle" x="642.2309" y="-29.8" font-family="Times,serif" font-size="14.00" fill="#000000">prompt</text>
</g>
<!-- Predictor&#45;&gt;Results -->
<g id="edge4" class="edge">
<title>Predictor-&gt;Results</title>
<path fill="none" stroke="#000000" d="M539.2025,-34C559.8586,-34 584.0523,-34 603.8025,-34"/>
<polygon fill="#000000" stroke="#000000" points="603.9743,-37.5001 613.9743,-34 603.9743,-30.5001 603.9743,-37.5001"/>
</g>
<!-- Results&#45;&gt;STIXGenCoT -->
<g id="edge6" class="edge">
<title>Results-&gt;STIXGenCoT</title>
<path fill="none" stroke="#000000" d="M670.6746,-34C681.9025,-34 694.9588,-34 706.934,-34"/>
<polygon fill="#000000" stroke="#000000" points="707.1063,-37.5001 717.1062,-34 707.1062,-30.5001 707.1063,-37.5001"/>
<text text-anchor="middle" x="693.8971" y="-38.2" font-family="Times,serif" font-size="14.00" fill="#000000">to</text>
</g>
</g>
</svg>

After

Width:  |  Height:  |  Size: 4.9 KiB

View File

@@ -0,0 +1,493 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## A tutorial to visualize one-shot learning results\n",
"\n",
"### Goal\n",
"- Compare one-shot learning with zero-shot learning\n",
"- To visualize the different\n",
"\n",
"### Approach\n",
"- Directly generate a DOT file from one-shot learning example completed in previous tutorial"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Step 1: Download libraries and files for the lab\n",
"- Make use you download necessary library and files. \n",
"- All downloaded and saved files can be located in the `content` folder if using google Colab"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [],
"source": [
"# uncomment the commands to download libraries and files\n",
"#!pip install python-dotenv\n",
"#!pip install dspy-ai\n",
"#!pip install graphviz\n",
"\n",
"import dspy\n",
"import os\n",
"import openai\n",
"import json\n",
"from dotenv import load_dotenv\n",
"from graphviz import Source\n",
"from IPython.display import display"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [],
"source": [
"def set_dspy():\n",
" # ==============set openAI enviroment=========\n",
" # Path to your API key file\n",
" key_file_path = \"openai_api_key.txt\"\n",
"\n",
" # Load the API key from the file\n",
" with open(key_file_path, \"r\") as file:\n",
" openai_api_key = file.read().strip()\n",
"\n",
" # Set the API key as an environment variable\n",
" os.environ[\"OPENAI_API_KEY\"] = openai_api_key\n",
" openai.api_key = os.environ[\"OPENAI_API_KEY\"]\n",
" turbo = dspy.OpenAI(model=\"gpt-3.5-turbo\", max_tokens=2000, temperature=0)\n",
" dspy.settings.configure(lm=turbo)\n",
" return turbo\n",
" # ==============end of set openAI enviroment=========\n",
"\n",
"\n",
"def set_dspy_hardcode_openai_key():\n",
" os.environ[\"OPENAI_API_KEY\"] = (\n",
" \"sk-proj-yourapikeyhere\"\n",
" )\n",
" openai.api_key = os.environ[\"OPENAI_API_KEY\"]\n",
" turbo = dspy.OpenAI(model=\"gpt-3.5-turbo\", temperature=0, max_tokens=2000)\n",
" dspy.settings.configure(lm=turbo)\n",
" return turbo\n",
"\n",
"turbo=set_dspy()\n",
"# comment out set_dspy() and use set_dspy_hardcode_openai_key is your option\n",
"# turbo=set_dspy_hardcode_openai_key()"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [],
"source": [
"def load_text_file(file_path):\n",
" \"\"\"\n",
" Load a text file and return its contents as a string.\n",
"\n",
" Parameters:\n",
" file_path (str): The path to the text file.\n",
"\n",
" Returns:\n",
" str: The contents of the text file.\n",
" \"\"\"\n",
" try:\n",
" with open(file_path, \"r\") as file:\n",
" contents = file.read()\n",
" return contents\n",
" except FileNotFoundError:\n",
" return \"File not found.\"\n",
" except Exception as e:\n",
" return f\"An error occurred: {e}\"\n"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[\n",
" {\n",
" \"type\": \"identity\",\n",
" \"id\": \"identity--1cba2e3c-4bdb-4d0b-a87b-2d504ad5923f\",\n",
" \"name\": \"OurCompany\",\n",
" \"identity_class\": \"organization\",\n",
" \"sectors\": [\n",
" \"technology\"\n",
" ],\n",
" \"contact_information\": \"info@ourcompany.com\"\n",
" },\n",
" {\n",
" \"type\": \"email-addr\",\n",
" \"id\": \"email-addr--0c0d2094-df97-45a7-9e9c-223569a9e798\",\n",
" \"value\": \"hr-dept@ourcompany-security.com\"\n",
" },\n",
" {\n",
" \"type\": \"email-message\",\n",
" \"id\": \"email-message--c79b6bde-4f4c-4b38-a8c8-fb82921d6b97\",\n",
" \"is_multipart\": false,\n",
" \"subject\": \"Urgent Benefits Package Update\",\n",
" \"from_ref\": \"email-addr--0c0d2094-df97-45a7-9e9c-223569a9e798\",\n",
" \"body\": \"Please click the link to review the changes to your benefits package.\"\n",
" },\n",
" {\n",
" \"type\": \"url\",\n",
" \"id\": \"url--4c3b-4c4b-bb6c-ded6b2a4a567\",\n",
" \"value\": \"http://phishing-link.com/login\"\n",
" },\n",
" {\n",
" \"type\": \"user-account\",\n",
" \"id\": \"user-account--bd5631cf-2af6-4bba-bc92-37c60d020400\",\n",
" \"user_id\": \"Taylor\",\n",
" \"account_login\": \"taylor@ourcompany.com\"\n",
" },\n",
" {\n",
" \"type\": \"observable\",\n",
" \"id\": \"observable--001\",\n",
" \"observable_type\": \"email\",\n",
" \"observable_value\": \"hr-dept@ourcompany-security.com\"\n",
" },\n",
" {\n",
" \"type\": \"observable\",\n",
" \"id\": \"observable--002\",\n",
" \"observable_type\": \"url\",\n",
" \"observable_value\": \"http://phishing-link.com/login\"\n",
" },\n",
" {\n",
" \"type\": \"indicator\",\n",
" \"id\": \"indicator--1cba2e3c-4bdb-4d0b-a87b-2d504ad5923f\",\n",
" \"name\": \"Phishing Email Indicator\",\n",
" \"pattern\": \"[email-message:subject = 'Urgent Benefits Package Update']\",\n",
" \"valid_from\": \"2024-07-17T00:00:00Z\"\n",
" },\n",
" {\n",
" \"type\": \"incident\",\n",
" \"id\": \"incident--7a2b252e-c3e5-4bc2-bc6f-cb917ecf7857\",\n",
" \"name\": \"Phishing Attack on OurCompany\",\n",
" \"description\": \"A phishing attack where a suspicious email was sent to an employee of OurCompany.\",\n",
" \"first_seen\": \"2024-07-17T08:00:00Z\",\n",
" \"last_seen\": \"2024-07-17T08:10:00Z\",\n",
" \"status\": \"ongoing\",\n",
" \"affected_assets\": [\n",
" \"user-account--bd5631cf-2af6-4bba-bc92-37c60d020400\"\n",
" ]\n",
" },\n",
" {\n",
" \"type\": \"relationship\",\n",
" \"id\": \"relationship--3f1a8d8b-6a6e-4b5d-8e15-2d6d9a2b3f1d\",\n",
" \"relationship_type\": \"indicates\",\n",
" \"source_ref\": \"indicator--1cba2e3c-4bdb-4d0b-a87b-2d504ad5923f\",\n",
" \"target_ref\": \"incident--7a2b252e-c3e5-4bc2-bc6f-cb917ecf7857\"\n",
" },\n",
" {\n",
" \"type\": \"relationship\",\n",
" \"id\": \"relationship--4b6e65f3-743d-40c2-9194-3b5e38b3efed\",\n",
" \"relationship_type\": \"attributed-to\",\n",
" \"source_ref\": \"incident--7a2b252e-c3e5-4bc2-bc6f-cb917ecf7857\",\n",
" \"target_ref\": \"identity--1cba2e3c-4bdb-4d0b-a87b-2d504ad5923f\"\n",
" },\n",
" {\n",
" \"type\": \"relationship\",\n",
" \"id\": \"relationship--5c9b6eaf-27a6-4b2b-9b17-49e3b00f6051\",\n",
" \"relationship_type\": \"uses\",\n",
" \"source_ref\": \"incident--7a2b252e-c3e5-4bc2-bc6f-cb917ecf7857\",\n",
" \"target_ref\": \"url--4c3b-4c4b-bb6c-ded6b2a4a567\"\n",
" }\n",
"]\n"
]
}
],
"source": [
"conversation = load_text_file(\"04_output_for_viz.json\")\n",
"print(conversation)"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [],
"source": [
"class DotGenerator(dspy.Signature):\n",
" \"\"\"Generate a evidence knowledge graph based on a cyber incident expressed in Structured Threat Information Expression (STIX).\"\"\"\n",
"\n",
" question: str = dspy.InputField(\n",
" desc=\"a cyber incident expressed in Structured Threat Information Expression with JSON format.\"\n",
" )\n",
"\n",
" answer: str = dspy.OutputField(\n",
" desc=\"a graph in a dot format. The nodes of the graph are evidence entities in STIX or Cyber Forensic Domain Objects and Cyber Forensic Observable Objects in DFKG and the edges of the graph are the relationships between evidence entities in STIX. A dot format is primarily associated with Graphviz, a graph visualization software. For example, a dot should looks like: digraph incident_name {...}. Don't include `````` \"\n",
" )"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"outputs": [],
"source": [
"# Important: Predict is better than ChainOfThough\n",
"def generate_answer_CoT(signature, text, output_file):\n",
" generate_answer = dspy.Predict(signature)\n",
" answer = generate_answer(question=text).answer # here we use the module\n",
"\n",
" with open(output_file, \"w\") as dot_file:\n",
" print(answer)\n",
" dot_file.write(answer)\n",
" return answer\n",
" print(f\"The evidence has been saved to the file {output_file}\")"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"digraph phishing_attack {\n",
" \"OurCompany\" [label=\"OurCompany\\norganization\\ninfo@ourcompany.com\"]\n",
" \"hr-dept@ourcompany-security.com\" [label=\"hr-dept@ourcompany-security.com\"]\n",
" \"email-message--c79b6bde-4f4c-4b38-a8c8-fb82921d6b97\" [label=\"email-message\\nUrgent Benefits Package Update\\nFrom: hr-dept@ourcompany-security.com\\nPlease click the link to review the changes to your benefits package.\"]\n",
" \"http://phishing-link.com/login\" [label=\"http://phishing-link.com/login\"]\n",
" \"Taylor\" [label=\"Taylor\\ntaylor@ourcompany.com\"]\n",
" \n",
" \"hr-dept@ourcompany-security.com\" -> \"email-message--c79b6bde-4f4c-4b38-a8c8-fb82921d6b97\"\n",
" \"email-message--c79b6bde-4f4c-4b38-a8c8-fb82921d6b97\" -> \"http://phishing-link.com/login\"\n",
" \"Taylor\" -> \"hr-dept@ourcompany-security.com\"\n",
" \n",
" \"Phishing Email Indicator\" [label=\"Phishing Email Indicator\\nPattern: [email-message:subject = 'Urgent Benefits Package Update']\\nValid From: 2024-07-17T00:00:00Z\"]\n",
" \"Phishing Attack on OurCompany\" [label=\"Phishing Attack on OurCompany\\nDescription: A phishing attack where a suspicious email was sent to an employee of OurCompany.\\nFirst Seen: 2024-07-17T08:00:00Z\\nLast Seen: 2024-07-17T08:10:00Z\\nStatus: ongoing\"]\n",
" \n",
" \"Phishing Email Indicator\" -> \"Phishing Attack on OurCompany\"\n",
" \"OurCompany\" -> \"Phishing Attack on OurCompany\"\n",
" \"Phishing Attack on OurCompany\" -> \"Taylor\"\n",
" \"Phishing Attack on OurCompany\" -> \"http://phishing-link.com/login\"\n",
"}\n"
]
}
],
"source": [
"output_file = \"05_output.dot\"\n",
"dot_description = generate_answer_CoT(\n",
" DotGenerator,\n",
" conversation,\n",
" output_file,\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Graph saved as: 05_output_stix_oneshot.png\n"
]
},
{
"data": {
"image/svg+xml": [
"<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n",
"<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\n",
" \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\n",
"<!-- Generated by graphviz version 2.47.1 (20210417.1919)\n",
" -->\n",
"<!-- Title: phishing_attack Pages: 1 -->\n",
"<svg width=\"924pt\" height=\"602pt\"\n",
" viewBox=\"0.00 0.00 923.82 602.24\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\n",
"<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 598.24)\">\n",
"<title>phishing_attack</title>\n",
"<polygon fill=\"white\" stroke=\"transparent\" points=\"-4,4 -4,-598.24 919.82,-598.24 919.82,4 -4,4\"/>\n",
"<!-- OurCompany -->\n",
"<g id=\"node1\" class=\"node\">\n",
"<title>OurCompany</title>\n",
"<ellipse fill=\"none\" stroke=\"black\" cx=\"198.22\" cy=\"-556.76\" rx=\"106.13\" ry=\"37.45\"/>\n",
"<text text-anchor=\"middle\" x=\"198.22\" y=\"-568.06\" font-family=\"Times New Roman,serif\" font-size=\"14.00\">OurCompany</text>\n",
"<text text-anchor=\"middle\" x=\"198.22\" y=\"-553.06\" font-family=\"Times New Roman,serif\" font-size=\"14.00\">organization</text>\n",
"<text text-anchor=\"middle\" x=\"198.22\" y=\"-538.06\" font-family=\"Times New Roman,serif\" font-size=\"14.00\">info@ourcompany.com</text>\n",
"</g>\n",
"<!-- Phishing Attack on OurCompany -->\n",
"<g id=\"node7\" class=\"node\">\n",
"<title>Phishing Attack on OurCompany</title>\n",
"<ellipse fill=\"none\" stroke=\"black\" cx=\"400.22\" cy=\"-424.6\" rx=\"400.45\" ry=\"58.88\"/>\n",
"<text text-anchor=\"middle\" x=\"400.22\" y=\"-450.9\" font-family=\"Times New Roman,serif\" font-size=\"14.00\">Phishing Attack on OurCompany</text>\n",
"<text text-anchor=\"middle\" x=\"400.22\" y=\"-435.9\" font-family=\"Times New Roman,serif\" font-size=\"14.00\">Description: A phishing attack where a suspicious email was sent to an employee of OurCompany.</text>\n",
"<text text-anchor=\"middle\" x=\"400.22\" y=\"-420.9\" font-family=\"Times New Roman,serif\" font-size=\"14.00\">First Seen: 2024&#45;07&#45;17T08:00:00Z</text>\n",
"<text text-anchor=\"middle\" x=\"400.22\" y=\"-405.9\" font-family=\"Times New Roman,serif\" font-size=\"14.00\">Last Seen: 2024&#45;07&#45;17T08:10:00Z</text>\n",
"<text text-anchor=\"middle\" x=\"400.22\" y=\"-390.9\" font-family=\"Times New Roman,serif\" font-size=\"14.00\">Status: ongoing</text>\n",
"</g>\n",
"<!-- OurCompany&#45;&gt;Phishing Attack on OurCompany -->\n",
"<g id=\"edge5\" class=\"edge\">\n",
"<title>OurCompany&#45;&gt;Phishing Attack on OurCompany</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M248.16,-523.59C265.15,-512.64 284.76,-500 304.14,-487.51\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"306.2,-490.35 312.71,-481.99 302.41,-484.46 306.2,-490.35\"/>\n",
"</g>\n",
"<!-- hr&#45;dept@ourcompany&#45;security.com -->\n",
"<g id=\"node2\" class=\"node\">\n",
"<title>hr&#45;dept@ourcompany&#45;security.com</title>\n",
"<ellipse fill=\"none\" stroke=\"black\" cx=\"568.22\" cy=\"-222.17\" rx=\"141.07\" ry=\"18\"/>\n",
"<text text-anchor=\"middle\" x=\"568.22\" y=\"-218.47\" font-family=\"Times New Roman,serif\" font-size=\"14.00\">hr&#45;dept@ourcompany&#45;security.com</text>\n",
"</g>\n",
"<!-- email&#45;message&#45;&#45;c79b6bde&#45;4f4c&#45;4b38&#45;a8c8&#45;fb82921d6b97 -->\n",
"<g id=\"node3\" class=\"node\">\n",
"<title>email&#45;message&#45;&#45;c79b6bde&#45;4f4c&#45;4b38&#45;a8c8&#45;fb82921d6b97</title>\n",
"<ellipse fill=\"none\" stroke=\"black\" cx=\"637.22\" cy=\"-120.08\" rx=\"278.7\" ry=\"48.17\"/>\n",
"<text text-anchor=\"middle\" x=\"637.22\" y=\"-138.88\" font-family=\"Times New Roman,serif\" font-size=\"14.00\">email&#45;message</text>\n",
"<text text-anchor=\"middle\" x=\"637.22\" y=\"-123.88\" font-family=\"Times New Roman,serif\" font-size=\"14.00\">Urgent Benefits Package Update</text>\n",
"<text text-anchor=\"middle\" x=\"637.22\" y=\"-108.88\" font-family=\"Times New Roman,serif\" font-size=\"14.00\">From: hr&#45;dept@ourcompany&#45;security.com</text>\n",
"<text text-anchor=\"middle\" x=\"637.22\" y=\"-93.88\" font-family=\"Times New Roman,serif\" font-size=\"14.00\">Please click the link to review the changes to your benefits package.</text>\n",
"</g>\n",
"<!-- hr&#45;dept@ourcompany&#45;security.com&#45;&gt;email&#45;message&#45;&#45;c79b6bde&#45;4f4c&#45;4b38&#45;a8c8&#45;fb82921d6b97 -->\n",
"<g id=\"edge1\" class=\"edge\">\n",
"<title>hr&#45;dept@ourcompany&#45;security.com&#45;&gt;email&#45;message&#45;&#45;c79b6bde&#45;4f4c&#45;4b38&#45;a8c8&#45;fb82921d6b97</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M579.93,-204.19C585.33,-196.35 592.09,-186.54 599.05,-176.45\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"602.13,-178.15 604.92,-167.93 596.36,-174.18 602.13,-178.15\"/>\n",
"</g>\n",
"<!-- http://phishing&#45;link.com/login -->\n",
"<g id=\"node4\" class=\"node\">\n",
"<title>http://phishing&#45;link.com/login</title>\n",
"<ellipse fill=\"none\" stroke=\"black\" cx=\"483.22\" cy=\"-18\" rx=\"116.18\" ry=\"18\"/>\n",
"<text text-anchor=\"middle\" x=\"483.22\" y=\"-14.3\" font-family=\"Times New Roman,serif\" font-size=\"14.00\">http://phishing&#45;link.com/login</text>\n",
"</g>\n",
"<!-- email&#45;message&#45;&#45;c79b6bde&#45;4f4c&#45;4b38&#45;a8c8&#45;fb82921d6b97&#45;&gt;http://phishing&#45;link.com/login -->\n",
"<g id=\"edge2\" class=\"edge\">\n",
"<title>email&#45;message&#45;&#45;c79b6bde&#45;4f4c&#45;4b38&#45;a8c8&#45;fb82921d6b97&#45;&gt;http://phishing&#45;link.com/login</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M566.98,-73.43C549.77,-62.25 532.1,-50.77 517.46,-41.25\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"519.06,-38.11 508.76,-35.6 515.24,-43.98 519.06,-38.11\"/>\n",
"</g>\n",
"<!-- Taylor -->\n",
"<g id=\"node5\" class=\"node\">\n",
"<title>Taylor</title>\n",
"<ellipse fill=\"none\" stroke=\"black\" cx=\"519.22\" cy=\"-303.04\" rx=\"113.27\" ry=\"26.74\"/>\n",
"<text text-anchor=\"middle\" x=\"519.22\" y=\"-306.84\" font-family=\"Times New Roman,serif\" font-size=\"14.00\">Taylor</text>\n",
"<text text-anchor=\"middle\" x=\"519.22\" y=\"-291.84\" font-family=\"Times New Roman,serif\" font-size=\"14.00\">taylor@ourcompany.com</text>\n",
"</g>\n",
"<!-- Taylor&#45;&gt;hr&#45;dept@ourcompany&#45;security.com -->\n",
"<g id=\"edge3\" class=\"edge\">\n",
"<title>Taylor&#45;&gt;hr&#45;dept@ourcompany&#45;security.com</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M535.27,-276.21C540.65,-267.54 546.66,-257.87 552.06,-249.19\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"555.1,-250.92 557.4,-240.58 549.15,-247.23 555.1,-250.92\"/>\n",
"</g>\n",
"<!-- Phishing Email Indicator -->\n",
"<g id=\"node6\" class=\"node\">\n",
"<title>Phishing Email Indicator</title>\n",
"<ellipse fill=\"none\" stroke=\"black\" cx=\"603.22\" cy=\"-556.76\" rx=\"280.94\" ry=\"37.45\"/>\n",
"<text text-anchor=\"middle\" x=\"603.22\" y=\"-568.06\" font-family=\"Times New Roman,serif\" font-size=\"14.00\">Phishing Email Indicator</text>\n",
"<text text-anchor=\"middle\" x=\"603.22\" y=\"-553.06\" font-family=\"Times New Roman,serif\" font-size=\"14.00\">Pattern: [email&#45;message:subject = &#39;Urgent Benefits Package Update&#39;]</text>\n",
"<text text-anchor=\"middle\" x=\"603.22\" y=\"-538.06\" font-family=\"Times New Roman,serif\" font-size=\"14.00\">Valid From: 2024&#45;07&#45;17T00:00:00Z</text>\n",
"</g>\n",
"<!-- Phishing Email Indicator&#45;&gt;Phishing Attack on OurCompany -->\n",
"<g id=\"edge4\" class=\"edge\">\n",
"<title>Phishing Email Indicator&#45;&gt;Phishing Attack on OurCompany</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M547.48,-520.02C531.75,-509.93 514.16,-498.65 496.76,-487.5\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"498.55,-484.49 488.24,-482.04 494.77,-490.38 498.55,-484.49\"/>\n",
"</g>\n",
"<!-- Phishing Attack on OurCompany&#45;&gt;http://phishing&#45;link.com/login -->\n",
"<g id=\"edge7\" class=\"edge\">\n",
"<title>Phishing Attack on OurCompany&#45;&gt;http://phishing&#45;link.com/login</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M369.3,-365.8C334.13,-291.29 288.88,-160.75 349.22,-72 360.89,-54.84 379.06,-43.24 398.29,-35.39\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"399.66,-38.62 407.8,-31.83 397.2,-32.06 399.66,-38.62\"/>\n",
"</g>\n",
"<!-- Phishing Attack on OurCompany&#45;&gt;Taylor -->\n",
"<g id=\"edge6\" class=\"edge\">\n",
"<title>Phishing Attack on OurCompany&#45;&gt;Taylor</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M457.28,-366.27C467.45,-356.05 477.71,-345.74 486.88,-336.53\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"489.4,-338.96 493.97,-329.41 484.44,-334.02 489.4,-338.96\"/>\n",
"</g>\n",
"</g>\n",
"</svg>\n"
],
"text/plain": [
"<graphviz.sources.Source at 0x1291bc087a0>"
]
},
"execution_count": 29,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Load the .DOT file\n",
"dot_file_path = \"05_output.dot\"\n",
"\n",
"with open(dot_file_path, \"r\") as file:\n",
" dot_content = file.read()\n",
"\n",
"# Create a Graphviz Source object and render it\n",
"dot = Source(dot_content)\n",
"\n",
"# Render the graph and save it as a PNG file\n",
"output_file_path = \"05_output_stix_oneshot\"\n",
"dot.format = \"png\"\n",
"dot.render(output_file_path, cleanup=True)\n",
"\n",
"# Display the saved PNG file path\n",
"print(f\"Graph saved as: {output_file_path}.png\")\n",
"\n",
"# Display the graph in the Jupyter notebook\n",
"dot"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from IPython.display import Image\n",
"\n",
"# Path to the image file\n",
"image_path = \"path/to/your/image.png\"\n",
"\n",
"# Display the image\n",
"Image(filename=image_path)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Summary\n",
"\n",
", e.g., [Digital Forensic Knowledge Graph (DFKG)](https://github.com/frankwxu/digital-forensics-lab/tree/main/STIX_for_digital_forensics). "
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.3"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

View File

@@ -0,0 +1,85 @@
<?xml version="1.0" encoding="UTF-8" standalone="no"?><!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd"><!-- Generated by graphviz version 2.40.1 (20161225.0304)
--><!-- Title: STIXGenCoT Pages: 1 --><svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" width="779pt" height="121pt" viewBox="0.00 0.00 779.34 121.00">
<g id="graph0" class="graph" transform="scale(1 1) rotate(0) translate(4 117)">
<title>STIXGenCoT</title>
<polygon fill="#ffffff" stroke="transparent" points="-4,4 -4,-117 775.3424,-117 775.3424,4 -4,4"/>
<g id="clust1" class="cluster">
<title>cluster_forward</title>
<polygon fill="none" stroke="#000000" stroke-dasharray="5,2" points="8,-8 8,-105 678.4518,-105 678.4518,-8 8,-8"/>
<text text-anchor="middle" x="343.2259" y="-88.4" font-family="Times,serif" font-size="14.00" fill="#000000">forward method</text>
</g>
<!-- STIXGenCoT -->
<g id="node1" class="node">
<title>STIXGenCoT</title>
<polygon fill="#add8e6" stroke="#000000" points="771.3424,-52 717.3424,-52 717.3424,-16 771.3424,-16 771.3424,-52"/>
<text text-anchor="middle" x="744.3424" y="-29.8" font-family="Times,serif" font-size="14.00" fill="#000000">LLM</text>
</g>
<!-- Question -->
<g id="node2" class="node">
<title>Question</title>
<polygon fill="#add8e6" stroke="#000000" points="79.6621,-52 15.7783,-52 15.7783,-16 79.6621,-16 79.6621,-52"/>
<text text-anchor="middle" x="47.7202" y="-29.8" font-family="Times,serif" font-size="14.00" fill="#000000">question</text>
</g>
<!-- Retriever -->
<g id="node3" class="node">
<title>Retriever</title>
<polygon fill="#add8e6" stroke="#000000" points="241.6163,-52 154.3817,-52 154.3817,-16 241.6163,-16 241.6163,-52"/>
<text text-anchor="middle" x="197.999" y="-29.8" font-family="Times,serif" font-size="14.00" fill="#000000">self.retriever</text>
</g>
<!-- Question&#45;&gt;Retriever -->
<g id="edge1" class="edge">
<title>Question-&gt;Retriever</title>
<path fill="none" stroke="#000000" d="M79.5905,-34C98.3339,-34 122.5264,-34 144.1535,-34"/>
<polygon fill="#000000" stroke="#000000" points="144.204,-37.5001 154.204,-34 144.204,-30.5001 144.204,-37.5001"/>
</g>
<!-- Predictor -->
<g id="node5" class="node">
<title>Predictor</title>
<polygon fill="#add8e6" stroke="#000000" points="539.1313,-52 448.6467,-52 448.6467,-16 539.1313,-16 539.1313,-52"/>
<text text-anchor="middle" x="493.889" y="-29.8" font-family="Times,serif" font-size="14.00" fill="#000000">self.predictor</text>
</g>
<!-- Question&#45;&gt;Predictor -->
<g id="edge5" class="edge">
<title>Question-&gt;Predictor</title>
<path fill="none" stroke="#000000" d="M79.5481,-44.2567C100.5597,-50.4774 128.8375,-57.8115 154.4404,-61 251.1721,-73.0467 276.8841,-71.7543 373.768,-61 395.213,-58.6196 418.4229,-53.9183 438.6714,-49.0966"/>
<polygon fill="#000000" stroke="#000000" points="439.7764,-52.4296 448.658,-46.653 438.1127,-45.6301 439.7764,-52.4296"/>
</g>
<!-- Context -->
<g id="node4" class="node">
<title>Context</title>
<polygon fill="#add8e6" stroke="#000000" points="373.8736,-52 316.452,-52 316.452,-16 373.8736,-16 373.8736,-52"/>
<text text-anchor="middle" x="345.1628" y="-29.8" font-family="Times,serif" font-size="14.00" fill="#000000">context</text>
</g>
<!-- Retriever&#45;&gt;Context -->
<g id="edge2" class="edge">
<title>Retriever-&gt;Context</title>
<path fill="none" stroke="#000000" d="M241.7545,-34C262.1511,-34 286.2282,-34 306.0156,-34"/>
<polygon fill="#000000" stroke="#000000" points="306.2225,-37.5001 316.2224,-34 306.2224,-30.5001 306.2225,-37.5001"/>
</g>
<!-- Context&#45;&gt;Predictor -->
<g id="edge3" class="edge">
<title>Context-&gt;Predictor</title>
<path fill="none" stroke="#000000" d="M373.8387,-34C392.129,-34 416.4399,-34 438.3995,-34"/>
<polygon fill="#000000" stroke="#000000" points="438.6226,-37.5001 448.6226,-34 438.6225,-30.5001 438.6226,-37.5001"/>
</g>
<!-- Results -->
<g id="node6" class="node">
<title>Results</title>
<polygon fill="#add8e6" stroke="#000000" points="670.6744,-52 613.7874,-52 613.7874,-16 670.6744,-16 670.6744,-52"/>
<text text-anchor="middle" x="642.2309" y="-29.8" font-family="Times,serif" font-size="14.00" fill="#000000">prompt</text>
</g>
<!-- Predictor&#45;&gt;Results -->
<g id="edge4" class="edge">
<title>Predictor-&gt;Results</title>
<path fill="none" stroke="#000000" d="M539.2025,-34C559.8586,-34 584.0523,-34 603.8025,-34"/>
<polygon fill="#000000" stroke="#000000" points="603.9743,-37.5001 613.9743,-34 603.9743,-30.5001 603.9743,-37.5001"/>
</g>
<!-- Results&#45;&gt;STIXGenCoT -->
<g id="edge6" class="edge">
<title>Results-&gt;STIXGenCoT</title>
<path fill="none" stroke="#000000" d="M670.6746,-34C681.9025,-34 694.9588,-34 706.934,-34"/>
<polygon fill="#000000" stroke="#000000" points="707.1063,-37.5001 717.1062,-34 707.1062,-30.5001 707.1063,-37.5001"/>
<text text-anchor="middle" x="693.8971" y="-38.2" font-family="Times,serif" font-size="14.00" fill="#000000">to</text>
</g>
</g>
</svg>

After

Width:  |  Height:  |  Size: 4.9 KiB

View File

@@ -1,8 +1,21 @@
digraph Phishing_Attack {
"OurCompany" -> "incident--7a2b252e-c3e5-4bc2-bc6f-cb917ecf7857" [label="attributed-to"];
"email-addr--0c0d2094-df97-45a7-9e9c-223569a9e798" -> "email-message--c79b6bde-4f4c-4b38-a8c8-fb82921d6b97" [label="from"];
"email-message--c79b6bde-4f4c-4b38-a8c8-fb82921d6b97" -> "url--4c3b-4c4b-bb6c-ded6b2a4a567" [label="contains"];
"email-message--c79b6bde-4f4c-4b38-a8c8-fb82921d6b97" -> "observable--001" [label="observable"];
"url--4c3b-4c4b-bb6c-ded6b2a4a567" -> "observable--002" [label="observable"];
"incident--7a2b252e-c3e5-4bc2-bc6f-cb917ecf7857" -> "indicator--1cba2e3c-4bdb-4d0b-a87b-2d504ad5923f" [label="indicates"];
digraph phishing_attack {
"identity--1cba2e3c-4bdb-4d0b-a87b-2d504ad5923f" [label="OurCompany", shape="ellipse"];
"email-addr--0c0d2094-df97-45a7-9e9c-223569a9e798" [label="hr-dept@ourcompany-security.com", shape="ellipse"];
"email-message--c79b6bde-4f4c-4b38-a8c8-fb82921d6b97" [label="Urgent Benefits Package Update", shape="box"];
"url--4c3b-4c4b-bb6c-ded6b2a4a567" [label="http://phishing-link.com/login", shape="ellipse"];
"user-account--bd5631cf-2af6-4bba-bc92-37c60d020400" [label="Taylor (taylor@ourcompany.com)", shape="ellipse"];
"observable--001" [label="hr-dept@ourcompany-security.com", shape="ellipse"];
"observable--002" [label="http://phishing-link.com/login", shape="ellipse"];
"indicator--1cba2e3c-4bdb-4d0b-a87b-2d504ad5923f" [label="Phishing Email Indicator", shape="diamond"];
"incident--7a2b252e-c3e5-4bc2-bc6f-cb917ecf7857" [label="Phishing Attack on OurCompany", shape="box"];
"relationship--3f1a8d8b-6a6e-4b5d-8e15-2d6d9a2b3f1d" -> "indicator--1cba2e3c-4bdb-4d0b-a87b-2d504ad5923f";
"relationship--3f1a8d8b-6a6e-4b5d-8e15-2d6d9a2b3f1d" -> "incident--7a2b252e-c3e5-4bc2-bc6f-cb917ecf7857";
"relationship--4b6e65f3-743d-40c2-9194-3b5e38b3efed" -> "incident--7a2b252e-c3e5-4bc2-bc6f-cb917ecf7857";
"relationship--4b6e65f3-743d-40c2-9194-3b5e38b3efed" -> "identity--1cba2e3c-4bdb-4d0b-a87b-2d504ad5923f";
"relationship--5c9b6eaf-27a6-4b2b-9b17-49e3b00f6051" -> "incident--7a2b252e-c3e5-4bc2-bc6f-cb917ecf7857";
"relationship--5c9b6eaf-27a6-4b2b-9b17-49e3b00f6051" -> "url--4c3b-4c4b-bb6c-ded6b2a4a567";
}

Binary file not shown.

Before

Width:  |  Height:  |  Size: 54 KiB

After

Width:  |  Height:  |  Size: 60 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 55 KiB

View File

@@ -0,0 +1,152 @@
<?xml version="1.0" encoding="UTF-8" standalone="no"?><!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd"><!-- Generated by graphviz version 2.40.1 (20161225.0304)
--><!-- Title: G Pages: 1 --><svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" width="2500pt" height="194pt" viewBox="0.00 0.00 2500.29 194.40">
<g id="graph0" class="graph" transform="scale(1 1) rotate(0) translate(4 190.4)">
<title>G</title>
<polygon fill="#ffffff" stroke="transparent" points="-4,4 -4,-190.4 2496.2943,-190.4 2496.2943,4 -4,4"/>
<!-- Email -->
<g id="node1" class="node">
<title>Email</title>
<polygon fill="none" stroke="#000000" points="613.4049,-186.6014 353.9079,-186.6014 353.9079,-127.7986 613.4049,-127.7986 613.4049,-186.6014"/>
<text text-anchor="middle" x="483.6564" y="-169.8" font-family="Times,serif" font-size="14.00" fill="#000000">Email</text>
<text text-anchor="middle" x="483.6564" y="-153" font-family="Times,serif" font-size="14.00" fill="#000000">sender: support@banksecure.com</text>
<text text-anchor="middle" x="483.6564" y="-136.2" font-family="Times,serif" font-size="14.00" fill="#000000">subject: Urgent: Verify Your Account Now</text>
</g>
<!-- Headers -->
<g id="node2" class="node">
<title>Headers</title>
<polygon fill="none" stroke="#000000" points="165.4695,-83.7003 -.1567,-83.7003 -.1567,-8.2997 165.4695,-8.2997 165.4695,-83.7003"/>
<text text-anchor="middle" x="82.6564" y="-67" font-family="Times,serif" font-size="14.00" fill="#000000">Headers</text>
<text text-anchor="middle" x="82.6564" y="-50.2" font-family="Times,serif" font-size="14.00" fill="#000000">IP_address: 192.168.10.45</text>
<text text-anchor="middle" x="82.6564" y="-33.4" font-family="Times,serif" font-size="14.00" fill="#000000">domain: banksecure.com</text>
<text text-anchor="middle" x="82.6564" y="-16.6" font-family="Times,serif" font-size="14.00" fill="#000000">registered_to: Russia</text>
</g>
<!-- Email&#45;&gt;Headers -->
<g id="edge1" class="edge">
<title>Email-&gt;Headers</title>
<path fill="none" stroke="#000000" d="M353.775,-137.6786C297.4837,-127.2351 231.3854,-112.2695 173.6564,-92 169.8657,-90.669 166.0318,-89.2125 162.1924,-87.6631"/>
<polygon fill="#000000" stroke="#000000" points="163.2322,-84.3031 152.6582,-83.6394 160.5104,-90.7523 163.2322,-84.3031"/>
</g>
<!-- Link_0 -->
<g id="node3" class="node">
<title>Link_0</title>
<polygon fill="none" stroke="#000000" points="460.4245,-75.4014 182.8883,-75.4014 182.8883,-16.5986 460.4245,-16.5986 460.4245,-75.4014"/>
<text text-anchor="middle" x="321.6564" y="-58.6" font-family="Times,serif" font-size="14.00" fill="#000000">Link</text>
<text text-anchor="middle" x="321.6564" y="-41.8" font-family="Times,serif" font-size="14.00" fill="#000000">URL: http://banksecure-verification.com/login</text>
<text text-anchor="middle" x="321.6564" y="-25" font-family="Times,serif" font-size="14.00" fill="#000000">timestamp: 10:15 AM</text>
</g>
<!-- Email&#45;&gt;Link_0 -->
<g id="edge2" class="edge">
<title>Email-&gt;Link_0</title>
<path fill="none" stroke="#000000" d="M441.0928,-127.9835C420.0619,-113.5475 394.5307,-96.0223 372.4805,-80.8867"/>
<polygon fill="#000000" stroke="#000000" points="374.442,-77.9879 364.2167,-75.2142 370.4805,-83.7591 374.442,-77.9879"/>
</g>
<!-- Link_1 -->
<g id="node4" class="node">
<title>Link_1</title>
<polygon fill="none" stroke="#000000" points="811.3839,-75.4014 477.9289,-75.4014 477.9289,-16.5986 811.3839,-16.5986 811.3839,-75.4014"/>
<text text-anchor="middle" x="644.6564" y="-58.6" font-family="Times,serif" font-size="14.00" fill="#000000">Link</text>
<text text-anchor="middle" x="644.6564" y="-41.8" font-family="Times,serif" font-size="14.00" fill="#000000">URL: http://banksecure-verification.com/account-details</text>
<text text-anchor="middle" x="644.6564" y="-25" font-family="Times,serif" font-size="14.00" fill="#000000">timestamp: 10:17 AM</text>
</g>
<!-- Email&#45;&gt;Link_1 -->
<g id="edge3" class="edge">
<title>Email-&gt;Link_1</title>
<path fill="none" stroke="#000000" d="M525.9573,-127.9835C546.7663,-113.6111 572.0085,-96.1767 593.8562,-81.0869"/>
<polygon fill="#000000" stroke="#000000" points="596.1197,-83.7772 602.3588,-75.2142 592.1416,-78.0175 596.1197,-83.7772"/>
</g>
<!-- Attachment_0 -->
<g id="node5" class="node">
<title>Attachment_0</title>
<polygon fill="none" stroke="#000000" points="1132.4912,-92 828.8216,-92 828.8216,0 1132.4912,0 1132.4912,-92"/>
<text text-anchor="middle" x="980.6564" y="-75.4" font-family="Times,serif" font-size="14.00" fill="#000000">Attachment</text>
<text text-anchor="middle" x="980.6564" y="-58.6" font-family="Times,serif" font-size="14.00" fill="#000000">file_name: AccountDetails.exe</text>
<text text-anchor="middle" x="980.6564" y="-41.8" font-family="Times,serif" font-size="14.00" fill="#000000">created_at: 10:20 AM</text>
<text text-anchor="middle" x="980.6564" y="-25" font-family="Times,serif" font-size="14.00" fill="#000000">MD5_hash: e99a18c428cb38d5f260853678922e03</text>
<text text-anchor="middle" x="980.6564" y="-8.2" font-family="Times,serif" font-size="14.00" fill="#000000">status: known_malware</text>
</g>
<!-- Email&#45;&gt;Attachment_0 -->
<g id="edge4" class="edge">
<title>Email-&gt;Attachment_0</title>
<path fill="none" stroke="#000000" d="M613.4582,-134.872C673.6142,-123.8466 746.1735,-109.5917 818.9279,-92.37"/>
<polygon fill="#000000" stroke="#000000" points="819.8125,-95.7572 828.7275,-90.0324 818.1883,-88.9483 819.8125,-95.7572"/>
</g>
<!-- Actions -->
<g id="node6" class="node">
<title>Actions</title>
<polygon fill="none" stroke="#000000" points="1820.6852,-175.2 1722.6276,-175.2 1722.6276,-139.2 1820.6852,-139.2 1820.6852,-175.2"/>
<text text-anchor="middle" x="1771.6564" y="-153" font-family="Times,serif" font-size="14.00" fill="#000000">Actions Taken</text>
</g>
<!-- Action_0 -->
<g id="node7" class="node">
<title>Action_0</title>
<polygon fill="none" stroke="#000000" points="1347.0201,-64 1150.2927,-64 1150.2927,-28 1347.0201,-28 1347.0201,-64"/>
<text text-anchor="middle" x="1248.6564" y="-41.8" font-family="Times,serif" font-size="14.00" fill="#000000">Clear browser history and cache</text>
</g>
<!-- Actions&#45;&gt;Action_0 -->
<g id="edge5" class="edge">
<title>Actions-&gt;Action_0</title>
<path fill="none" stroke="#000000" d="M1722.4388,-153.5006C1643.6934,-146.6934 1485.077,-129.3195 1355.6564,-92 1334.7144,-85.9612 1312.4383,-76.8526 1293.6643,-68.3299"/>
<polygon fill="#000000" stroke="#000000" points="1295.0174,-65.0995 1284.4719,-64.078 1292.0786,-71.4528 1295.0174,-65.0995"/>
</g>
<!-- Action_1 -->
<g id="node8" class="node">
<title>Action_1</title>
<polygon fill="none" stroke="#000000" points="1508.3003,-64 1365.0125,-64 1365.0125,-28 1508.3003,-28 1508.3003,-64"/>
<text text-anchor="middle" x="1436.6564" y="-41.8" font-family="Times,serif" font-size="14.00" fill="#000000">Run full antivirus scan</text>
</g>
<!-- Actions&#45;&gt;Action_1 -->
<g id="edge6" class="edge">
<title>Actions-&gt;Action_1</title>
<path fill="none" stroke="#000000" d="M1722.4666,-148.7423C1670.1693,-138.8108 1585.6568,-120.0472 1516.6564,-92 1501.7032,-85.9218 1486.1289,-77.5031 1472.7385,-69.5275"/>
<polygon fill="#000000" stroke="#000000" points="1474.2122,-66.3269 1463.8516,-64.111 1470.569,-72.3042 1474.2122,-66.3269"/>
</g>
<!-- Action_2 -->
<g id="node9" class="node">
<title>Action_2</title>
<polygon fill="none" stroke="#000000" points="1787.473,-64 1525.8398,-64 1525.8398,-28 1787.473,-28 1787.473,-64"/>
<text text-anchor="middle" x="1656.6564" y="-41.8" font-family="Times,serif" font-size="14.00" fill="#000000">Provide browser history entries and cookies</text>
</g>
<!-- Actions&#45;&gt;Action_2 -->
<g id="edge7" class="edge">
<title>Actions-&gt;Action_2</title>
<path fill="none" stroke="#000000" d="M1752.9237,-139.0863C1733.8777,-120.6697 1704.2098,-91.982 1682.8426,-71.3209"/>
<polygon fill="#000000" stroke="#000000" points="1685.0029,-68.5412 1675.3811,-64.1059 1680.137,-73.5734 1685.0029,-68.5412"/>
</g>
<!-- Action_3 -->
<g id="node10" class="node">
<title>Action_3</title>
<polygon fill="none" stroke="#000000" points="1967.8347,-64 1805.4781,-64 1805.4781,-28 1967.8347,-28 1967.8347,-64"/>
<text text-anchor="middle" x="1886.6564" y="-41.8" font-family="Times,serif" font-size="14.00" fill="#000000">Quarantine suspicious file</text>
</g>
<!-- Actions&#45;&gt;Action_3 -->
<g id="edge8" class="edge">
<title>Actions-&gt;Action_3</title>
<path fill="none" stroke="#000000" d="M1790.3891,-139.0863C1809.4351,-120.6697 1839.103,-91.982 1860.4702,-71.3209"/>
<polygon fill="#000000" stroke="#000000" points="1863.1758,-73.5734 1867.9317,-64.1059 1858.3099,-68.5412 1863.1758,-73.5734"/>
</g>
<!-- Action_4 -->
<g id="node11" class="node">
<title>Action_4</title>
<polygon fill="none" stroke="#000000" points="2157.1649,-64 1986.1479,-64 1986.1479,-28 2157.1649,-28 2157.1649,-64"/>
<text text-anchor="middle" x="2071.6564" y="-41.8" font-family="Times,serif" font-size="14.00" fill="#000000">Check network connections</text>
</g>
<!-- Actions&#45;&gt;Action_4 -->
<g id="edge9" class="edge">
<title>Actions-&gt;Action_4</title>
<path fill="none" stroke="#000000" d="M1820.8076,-143.7144C1862.7732,-131.6934 1924.447,-112.8189 1976.6564,-92 1993.8733,-85.1346 2012.3158,-76.5132 2028.3275,-68.5952"/>
<polygon fill="#000000" stroke="#000000" points="2030.0692,-71.6375 2037.4465,-64.0331 2026.9372,-65.3773 2030.0692,-71.6375"/>
</g>
<!-- Action_5 -->
<g id="node12" class="node">
<title>Action_5</title>
<polygon fill="none" stroke="#000000" points="2492.4323,-64 2174.8805,-64 2174.8805,-28 2492.4323,-28 2492.4323,-64"/>
<text text-anchor="middle" x="2333.6564" y="-41.8" font-family="Times,serif" font-size="14.00" fill="#000000">Reset passwords and enable two-factor authentication</text>
</g>
<!-- Actions&#45;&gt;Action_5 -->
<g id="edge10" class="edge">
<title>Actions-&gt;Action_5</title>
<path fill="none" stroke="#000000" d="M1820.9286,-150.7657C1895.9867,-140.5822 2043.2728,-119.0973 2166.6564,-92 2198.9837,-84.9003 2234.3812,-75.4031 2264.2341,-66.882"/>
<polygon fill="#000000" stroke="#000000" points="2265.4904,-70.1627 2274.1331,-64.0346 2263.5552,-63.4354 2265.4904,-70.1627"/>
</g>
</g>
</svg>

After

Width:  |  Height:  |  Size: 10 KiB

View File

@@ -0,0 +1,19 @@
digraph phishing_attack {
"OurCompany" [label="OurCompany\norganization"]
"hr-dept@ourcompany-security.com" [label="hr-dept@ourcompany-security.com\nemail-addr"]
"Urgent Benefits Package Update" [label="Urgent Benefits Package Update\nemail-message"]
"http://phishing-link.com/login" [label="http://phishing-link.com/login\nurl"]
"Taylor" [label="Taylor\nuser-account"]
"OurCompany" -> "hr-dept@ourcompany-security.com" [label="email"]
"hr-dept@ourcompany-security.com" -> "Urgent Benefits Package Update" [label="email"]
"Urgent Benefits Package Update" -> "http://phishing-link.com/login" [label="link"]
"Taylor" -> "hr-dept@ourcompany-security.com" [label="login"]
"Urgent Benefits Package Update" -> "Phishing Email Indicator" [label="indicator"]
"http://phishing-link.com/login" -> "Phishing Email Indicator" [label="indicator"]
"Phishing Email Indicator" -> "Phishing Attack on OurCompany" [label="indicates"]
"Phishing Attack on OurCompany" -> "OurCompany" [label="attributed-to"]
"Phishing Attack on OurCompany" -> "http://phishing-link.com/login" [label="uses"]
}

Binary file not shown.

After

Width:  |  Height:  |  Size: 60 KiB

View File

@@ -30,15 +30,26 @@ By fostering a collaborative learning environment, this tutorial aims to empower
## Table of Contents
- Introduction
- Forensic evidence entity recognition
- Profiling suspect based on browser history
- [Political insights analysis based on Hillary's leaked Emails](#political-insight-analysis-leveraging-llms)
- Evidence knowledge reconstruction
- [Forensic evidence entity recognition (hands-on lab)](#forensic-evidence-analysis)
- [Evidence entity recognition](PhishingAttack\PhishingAttackScenarioDemo\01_evidence_entity_recognition.ipynb)
- [Visualize evidence and their relations](PhishingAttackScenarioDemo\02_evidence_knowledge_dot_generator.ipynb)
- [Evidence knowledge graphs reconstruction (hands-on lab)](#forensic-evidence-analysis)
- [Construct a knowledge graph in STIX (zero-shot)](PhishingAttackScenarioDemo\03_evidence_stix_zeroshot.ipynb)
- [Construct a knowledge graph in STIX (one-shot)](PhishingAttackScenarioDemo\04_evidence_stix_oneshot.ipynb)
- [Compare one-shot vs. zero-shot](PhishingAttackScenarioDemo\05_evidence_stix_dot_generator.ipynb)
- Profiling suspect based on browser history (hands-on lab)
- [Political insights analysis based on Hillary's leaked Emails (hands-on lab)](#political-insight-analysis-leveraging-llms)
- Challenges and Limitations of Leveraging LLM in Digital Forensics
- Conclusion
---
### Forensic Evidence Analysis
The cyber incident report documents a conversation between an IT Security Specialist and an Employee about an email phishing attack. We use LLMs to identify evidence entities and relationships and to construct digital forensic knowledge graphs.
Here is an example of a reconstructed digital forensics knowledge graph: <img src="PhishingAttackScenarioDemo\05_output_viz.png">
### Political Insight Analysis Leveraging LLMs
The case study demonstrates how to Leverage Large Language Models to gain political insight based on an email dataset. The dataset we have used in the case study is a set of leaked [emails](https://github.com/benhamner/hillary-clinton-emails?tab=readme-ov-file) obtained from Hillary Clinton's private email server.
@@ -47,7 +58,7 @@ The background of the leaked emails is a significant chapter in recent U.S. poli
The leaked email dataset from Hillary Clinton's private email server is a comprehensive collection of communications covering her entire tenure as Secretary of State from 2009 to 2013. It includes approximately 30,000 emails with a wide range of topics from official diplomatic communications to personal correspondences. The release and subsequent analysis of these emails have played a crucial role in political debates, legal inquiries, and public discussions about transparency and security in government communications.
Our dataset: [a set of email summaries](/AI4Forensics/CKIM2024/HillaryEmails/results_email_summary.txt). Each email summary is a summarization of an email generated by Gemini from an original email in the original leaked [email dataset](https://github.com/benhamner/hillary-clinton-emails?tab=readme-ov-file). We are only interested in emails containing the keyword "israel".
Our dataset: [a set of email summaries](/AI4Forensics/CKIM2024/HillaryEmails/results_email_summary.txt). Each email summary is a summarization of an email generated by Gemini from an original email in the original leaked [email dataset](https://github.com/benhamner/hillary-clinton-emails?tab=readme-ov-file). We are only interested in emails containing the keyword "Israel".
Our results: [Code in Jupyter Notebook](/AI4Forensics/CKIM2024/HillaryEmails/email_analysis_political_insight.ipynb).