Hey all,
I am in a fundamental AI class where we are being tasked with implementing an AI powered cybersecurity project to defend against cyber attacks.
We chose the following dataset to use:
www. kaggle. com/datasets/teamincribo/cyber-security-attacks?resource=download
The dataset is described as:
“Welcome to Incribo’s synthetic cyber dataset! Crafted with precision, this dataset offers a realistic representation of travel history, making it an ideal playground for various analytical tasks.
Use the cybersecurity attacks dataset to help you assess the heatmaps, attack signatures, types, and more.”
The labels are as follows:
Timestamp, Source IP Address, Destination IP Address, Source Port, Destination Port, Protocol, Packet Length, Packet Type, Traffic Type, Payload Data, Malware Indicators, Anomaly Scores, Alerts/Warnings, Attack Type, Attack Signature, Action Taken, Severity Level, User Information, Device Information, Network Segment, Geo-location Data, Proxy Information, Firewall Logs, IDS/IPS Alerts, Log Source
Only half of the 40,000 attacks alerted/warned or were identified as malware.
Questions
What ML models could we use to help train AI to better identify them all as attacks (regardless of severity)?
Suggestions for ways to use AI to better defend against the approx. 50% of attacks that were not registered as malicious?