Data mining

Hi everyone! I’m working on Edge_IIOTset dataset foe cybersecurity by an unsupervised machine learning approach. But I’m having a hard time understanding the data in order to enhance my ML algorithm performance. If anyone is interested and has some experience in data mining please email me for further details.

@ahmadi.m42@yahoo.com

Hi every one

Sounds like an interesting project — Edge_IIOTset can be challenging because the feature space mixes network telemetry, device behavior, and synthetic attack patterns, and without labels, it’s even harder to interpret. A good first step is to perform structured exploratory analysis: feature clustering, correlation maps, and distribution profiling to see which variables carry distinctive behavioral patterns. This often reveals which subsets of features are actually useful for unsupervised methods like isolation forests, autoencoders, or clustering.

If you’d like to discuss your preprocessing approach or feature engineering pipeline, feel free to share a bit more here. Many of us can help you reason through the structure before you jump deeper into modeling.

1 Like

Hi. Thank you so much for response and time. Yes the feature engineering and preprocessing is my main challenge right now. I’ve been reading a paper Redirecting which had the very same way and was explaining about the relationship between eigenvalues and singular values of a matrix and through this way, they found out about how to determine components in PCA and clusters in K-means algorithm( they combined the two algorithms for immediate detection). Which leads to my very issue that I need my data simplified( by PCA) and a normal like distribution( since my algorithm is multivariant density estimation). And I don’t know for sure how to manage these two problems. I’d be really glad if you’d help me.

Normalize first.
Then apply PCA.

1 Like