UNICOCODError: 'utf-8 with tfx.components.CsvExampleGen

I am trying to use tfx.components.CsvExampleGen with a csv that has spanish characters like á, Á, as a consecuence I get:

UNIcodeDecodeError: ‘utf-8 [while running ‘InputToRecord/ReadFromText’]’ codec can’t decode byte 0xc1 in position 72

Normaly I solve this with pd.read_csv adding “encoding='latin-1”.

Can anybody give a hint to solve this problem?

Hey guy.

You may need to create a custom component as it’s explained in week 3 (MLOps Methodology) of the Deploying Machine Learning Models in Production course

Seeing the CSV_EXAMPLE_GEN executor, I found this code:

for line in file:
        buffer.write(line.decode('utf-8'))

Maybe this line above is where you need to change

Hey @leobons, thank you very much. I will start with week 3 (MLOps Methodology) of the Deploying Machine Learning Models in Production course today. I guess you’re right, with custom components through Python functions it should work!!!