Estimations for Energy Consumption in Domestic and Public Buildings in London Using Machine Learning

fatihah · August 11, 2024, 12:49am

Hi there,

I am currently doing a project on estimating energy consumption for buildings in London. Thus, it is a regression task. I have a question regarding on how we split the data. Usually we will use

X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.33, random_state=42)
X = np.random.randn(12, 2)
y = np.random.randint(0, 2, 12)
tscv = TimeSeriesSplit(n_splits=3, test_size=2)

my initial data is 1mill yet i am training my data for 10k rnadomly using something like below:
sample_df = df.sample(n=10000, random_state=42)

My question is if no 2 is preferrable since its a yearly/sequence data , does my randomly selected 10k datasets will affect the model performance.

My second question is for data normalization:

min-max for NN
Z-score or standard scaler is for SVR,XGBOOST and MLR
Am i defined this correctly ?

TMosh · August 11, 2024, 2:00am

You aren’t using a validation set? That’s puzzling.

I don’t agree with your normalization categories. NN’s are not limited to min-max.

fatihah · August 14, 2024, 5:40am

i see i did since my computer crash if run more than 30k dataset so i just run on 10k dataset from 1 mill dataset. i am just confused either to use normal train_test_split or TimeSeriesSplit as my objective is a regression task for a time series data but cannot use LSTM that have window size as the data is collected only within apr-june yearly not like daily or monthly

fatihah · August 14, 2024, 10:12am

and from what i understand for normalization there few aspects to consider on what types of normalization need to use such as:
-min max
-z-score (standard scaler)
-log scale

Topic		Replies	Views
Forecasting Energy consumption using CNN_DNN_LSTM AI Discussions ai-discussions , project	0	38	July 12, 2024
Searching for a appropriate dataset Power Consumption Anomaly AI Discussions ai-discussions , project	7	46	September 17, 2024
Creating and randomizing training, dev, and test data sets AI Discussions	11	124	March 29, 2023
C4W4:Assignment 4 Sequences, Time Series and Prediction week-module-4	6	81	July 14, 2024
MLS C1 W2 Lab 5 question about normalization (moderator edit) Supervised ML: Regression and Classification	5	279	December 28, 2023

Estimations for Energy Consumption in Domestic and Public Buildings in London Using Machine Learning

Related topics