Handling small float numbers with TFX Feature engineering pipeline

Hello,

I’m trying to replicate the C2_W2_Lab_2_Feature_Engineering_Pipeline notebook with my own data, which is time-serie data.

When generating statistics from my dataset which contains log returns, which are often small float numbers, displaying the statistics rounds everything and I have a mean of 0, stdev of 0 etc. How to enable it to show more decimals? (Or should it be a configuration when generating example_gen)

Here is an example of the log_returns sample_records after generating the examples
‘log_returns’: {‘floatList’: {‘value’: [0.0048092753]}}}}}

Here is the image when showing the statistics

The numbers seems correct as when I explore the values they aren’t rounded, and on the right, I can see the correct distribution. So I imagine this has something to do with max decimal setting somewhere, but can’t find it

I’m not a mentor for this course, so I can’t see the course materials and don’t know what the lab contains.

I believe the issue is that the code (or function) you’re using to display those values all on one line is automatically reducing the number of digits displayed, so that they will all fit on one line.

Those values of 0 aren’t actually zero, they’re just shown that way for convenience on-screen.

So what I do to display those statistics is instanciating a StatisticsGen component and running it into the InteractiveContext

The exampleGen display correct & full values so I’m sure this is just something about displaying (which is what you hinted)

I don’t have that issue with other statistics being displayed tho, and the display doesn’t take the full line.

Here is the code to display those stats

statistics_gen = tfx.components.StatisticsGen(
    examples=example_gen.outputs["examples"]
)
context.run(statistics_gen)
context.show(statistics_gen.outputs["statistics"])

Here is the complete view of the stats being displayed.

It’s doing its best to format all the data you asked for into the space available.

1 Like