C2W4 Feature Engineering with Weather Data

def clean_fn(line):
Converts datetime strings in the CSV to Unix timestamps and removes outliers
the wind velocity column. Used as part of
the transform pipeline.

line (string) - one row of a CSV file



Split the CSV string to a list

line_split = line.split(b’,')

Decodes the timestamp string to utf-8

date_time_string = line_split[date_time_idx].decode(“utf-8”)

Creates a datetime object from the timestamp string

date_time = datetime.strptime(date_time_string, ‘%d.%m.%Y %H:%M:%S’)

Generates a timestamp from the object

timestamp = datetime.timestamp(date_time)

Overwrites the string timestamp in the row with the timestamp in seconds

line_split[date_time_idx] = bytes(str(timestamp), ‘utf-8’)

Can anyone please explain below points -

  1. why there is need to split the below code by bytes then “,”?
    line_split = line.split(b’,')

  2. What and why of the rest of the code - There are comments but what I don’t understand is why all of this is required and how it’s helping in achieving what?

Please see this link to get a perspective on python bytes string with respect to a string object.

The 2nd question is unclear to me. What do you expect to see in a feature engineering lab?