Suggestion for AI assignment

Hey all,

I am in a fundamental AI class where we are being tasked with implementing an AI powered cybersecurity project to defend against cyber attacks.

We chose the following dataset to use:

www. kaggle. com/datasets/teamincribo/cyber-security-attacks?resource=download

The dataset is described as:

“Welcome to Incribo’s synthetic cyber dataset! Crafted with precision, this dataset offers a realistic representation of travel history, making it an ideal playground for various analytical tasks.

Use the cybersecurity attacks dataset to help you assess the heatmaps, attack signatures, types, and more.”

The labels are as follows:

Timestamp, Source IP Address, Destination IP Address, Source Port, Destination Port, Protocol, Packet Length, Packet Type, Traffic Type, Payload Data, Malware Indicators, Anomaly Scores, Alerts/Warnings, Attack Type, Attack Signature, Action Taken, Severity Level, User Information, Device Information, Network Segment, Geo-location Data, Proxy Information, Firewall Logs, IDS/IPS Alerts, Log Source

Only half of the 40,000 attacks alerted/warned or were identified as malware.

Questions

What ML models could we use to help train AI to better identify them all as attacks (regardless of severity)?

Suggestions for ways to use AI to better defend against the approx. 50% of attacks that were not registered as malicious?

You don’t need a model for that. You just need a function that always returns “True”.

The issue is that a training set which contains only true examples cannot be used for training a model. If the goal is to tell attacks from non-attacks, you need examples of both.

A simpler comparison is a that you can’t identify cats vs. dogs if your dataset only includes images of cats.