Hi everyone
I have tried to follow up the suggestion at the end of the official notebook
C2_W2_Lab_2_Feature_Engineering_Pipeline to
use a different dataset.
The SeoulBikeData.csv is dowloaded programmatically. So there is no need to create locally a folder.
I have tested it on Google Colab. Anyway at the beginning of the notebook I have added a piece of code to detect the platform where the notebook is running on (Colab or not Colab).
This is just the initial release. Everything can be improved.
Looking forward your comments and feedbacks
BR
Is it possible to ingest data from csv, that is not in unicode? Source csv file is not in unicode. Converting it to unicode is not âwithin pipelineâ and to me that seems weird.
Hi @spsh
thanks for your question.
I had to use âlatin1â because when I tried to import the SeoulBikeSharing.csv in the usual way (not unicode) I got an error. So I changed the import format to âlatin1â. If you have found an alternative way please let me know and I will remove that flag.
BR
Hi @fabioantonini and thank you for having published your notebook !
Can you tell me why you choose âlatin1â as encoding ?
Also, when i use the print function like that:
with open(_data_filepath) as f:
print(f)
I have this displayed:
<_io.TextIOWrapper name=â./data/SeoulBikeData.csvâ mode=ârâ encoding=âUTF-8â>, which seems to indicate that the encoding of the csv file is UTF-8.
But when I run:
context.run(example_gen)
I have this error (like the one you surely encountered):
UnicodeDecodeError: âutf-8â codec canât decode byte 0xb0 in position 40: invalid start byte
Can you explain why the âutf-8â codec canât decode the file while it is indicated as utf-8 encoded ?
Sorry to bother you with my unicode questions but I have the feeling it wonât be the last time I will encounter this kind of Unicode error
The unicode problem can only be solved by reading the csv into a object like âdfâ and then after creating the âdataâ folder or âbikerdataâ folderâŚcnvert that âdfâ to csv by using the command df.to_csvâŚ
PS: Do Not unzip the csv in the âdataâ or âbikerdataâ folderâŚotherwise you will face encoding issues.
Thank you for creating the Colab version. I really had a tough time setting up the environment on Colab and my local machine (Windows 11). Please could you share links/ materials to how to best troubleshoot environment set up. Thanks