Hi everyone! I am beginner in Machine Learning and was working on a project recently. The project is to determine spikes in Power Consumption with the help of time and dates. I couldn’t find any appropriate dataset for this purpose so wondering if anyone can help me with this. also i decided to use SVM here any suggestions?
These two might be useful to you:
And IMHO opinion SVM should be fine for this purpose, but you might consider also Knn or XGBoost.
P.s. the UCI Machine Learning Repository is a great site for datasets.
thankyou…I’ll definitely check’em out
@rayha lots more here too:
But IMHO the UCI datasets are especially well documented/curated.
Keep in mind I couldn’t find anything directly about anomalies-- But the obvious indication is, by looking at the data, you will have to find the instances that are different from the norm yourself.
yeah exactly…all the datasets i have seen yet doesn’t have any significant information about anomalies rather it has 99% info of normal usage
is there any shorter or simpler way to handle data yourself manually?
Sorry, can you clarify ?
And also, to be honest, I think it is a worthwhile part of data science to practice working with the data first. I mean with the courses here we tend to focus more on the outright analysis, but a lot of time you just get ‘data’ and you don’t necessarily know what to expect.
Perhaps you have a hypothesis.
But this is where data exploration comes in. You can try doing even standard stats first (mean, median, mode, etc), put together a covariance matrix, perhaps even perform a PCA analysis to get a sense of the distribution.
And on top of that, run visualizations to see if any trends or seasonality ‘immediately pop-up’ in the data.
Such an first-run analysis is all part of being a good data scientist, IMHO.
Thank you so much! I’m looking forward to more help in the future.