Data mining

Maryam.A9127 · October 28, 2025, 5:55pm

Hi everyone! I’m working on Edge_IIOTset dataset foe cybersecurity by an unsupervised machine learning approach. But I’m having a hard time understanding the data in order to enhance my ML algorithm performance. If anyone is interested and has some experience in data mining please email me for further details.

@ahmadi.m42@yahoo.com

David_Hamayadji · October 28, 2025, 10:16pm

Hi every one

SteveArthur · November 26, 2025, 4:54pm

Sounds like an interesting project — Edge_IIOTset can be challenging because the feature space mixes network telemetry, device behavior, and synthetic attack patterns, and without labels, it’s even harder to interpret. A good first step is to perform structured exploratory analysis: feature clustering, correlation maps, and distribution profiling to see which variables carry distinctive behavioral patterns. This often reveals which subsets of features are actually useful for unsupervised methods like isolation forests, autoencoders, or clustering.

If you’d like to discuss your preprocessing approach or feature engineering pipeline, feel free to share a bit more here. Many of us can help you reason through the structure before you jump deeper into modeling.

Maryam.A9127 · November 28, 2025, 6:11am

Hi. Thank you so much for response and time. Yes the feature engineering and preprocessing is my main challenge right now. I’ve been reading a paper Redirecting which had the very same way and was explaining about the relationship between eigenvalues and singular values of a matrix and through this way, they found out about how to determine components in PCA and clusters in K-means algorithm( they combined the two algorithms for immediate detection). Which leads to my very issue that I need my data simplified( by PCA) and a normal like distribution( since my algorithm is multivariant density estimation). And I don’t know for sure how to manage these two problems. I’d be really glad if you’d help me.

TMosh · November 28, 2025, 9:54pm

Normalize first.
Then apply PCA.

Topic		Replies	Views
Anomaly detection-model selection AI Discussions ai-discussions , project	2	126	August 13, 2024
How to apply Clustering Techniques to a Cannabis Dataset for Analysis AI Discussions ai-discussions	3	50	September 2, 2024
ML for Manufacture data AI Discussions ai-discussions , data-centric	2	107	February 3, 2024
Anomaly Detection Improvement Issues Unsupervised Learning, Recommenders, Reinforcement week-module-1	12	564	July 9, 2023
DataCentric AI in unsupervised learning AI Discussions ai-discussions , data-centric	1	87	May 18, 2023

Data mining

Related topics