Hi Everyone,
I am working on ensemble project. Here I stuck on dataset conversion and manipulation. can anyone help me with this problem.
Dataset
Goal :DL: Defect level. _TRUE if the class contains one or more defects, false otherwise.
Here DL column is in some other format I cannot change the values if i change using pandas it’ make it as Nan.
Thank you
Hey @manobharathi_m,
Check this out, I just implemented a quick hack, but it serves your purpose.
import pandas as pd
from scipy.io import arff
from google.colab import files
upload = files.upload()
# Loading the Dataset into a variable
dataset = arff.loadarff('kc1-binary.arff')
# Convert it into a Pandas Dataframe
df = pd.DataFrame(dataset[0])
df.head()
df.DL = df.DL.apply(lambda x: False if str(x)[2] == 'F' else True)
df.head()
Let us know if this helps.
Cheers,
Elemento
Thanks it’s works @Elemento and I have one more doubt
X_train, X_test, y_train, y_test = train_test_split(x, y, test_size=0.1, random_state=42)
random _state affects/influence the accuracy score . What is the purpose of random_state and why is 42 ?
Hey @manobharathi_m,
random_state
can be understood from it’s name itself. Every line of code that depends on generating a random number, uses a random number generator (RNG). Now, although this RNG will always generate a new value whenever it is called, but using something known as seed
, which is also denoted by random_state
at some places, we can make sure that the results can be reproduced. In other words, in multiple runs of the same kernel, if the RNG is called 10 times, in all the runs, the same 10 numbers will be produced. I guess an example can help you much more in this case.
import numpy as np
nums = np.random.uniform(0, 1, 10)
print(nums)
Every time you run a kernel with the above lines of code, you will get a different result.
import numpy as np
np.random.seed(0)
nums = np.random.uniform(0, 1, 10)
print(nums)
But every time you run a kernel with the seed
fixed as in the above lines of code, you will get the same results. As for 42
, it is I guess the first number as the choice of seed that came to the author’s mind. In the above lines of code, I have fixed the seed = 0
. If you want, you can put in any other number as well. I hope this resolves your query.
Cheers,
Elemento
1 Like