Exponentially Weighted Averages - Graphing

So, this question might be a slight aside (as there seems nothing directly related in the lab)-- Yet still I was curious.

So I understand these graphs in the lecture were used mostly for visually explaining understanding the concept. Though-- And also to make sure I understand the topic, I was curious, programmatically, how they might be made?

I mean, for one, theta, in the temperature in Lodon example seems to be a scalar:

So, we have a relationship to Y in the graph, however though it is implied the relationship to the particular time step, X never fits directly into the equation (unless somehow the successive values of ‘v’ are our X? Though that would seem backward !?).

Further, if I am understanding this correctly, by sucessive applications of the formula:


If I am getting this right… what we are really forming is one really, really big polynomial-- Though that could be programatically a bit difficult to parse the whole resulting equation for every step of the graph.

I know someone might just say ‘well there are libraries that will do that for you’-- But if I wanted to do it myself, I am just trying to think through how that might be done with these sets of equations ?

1 Like

Well, you can work it out with a calculator or you can … wait for it … write a python program to compute that in a loop. Yes, it ends up being a high degree polynomial, but the point is you’re computing it one iteration at a time according to the algorithm they give you. You have a series of inputs and you compute the EWA at each step in the process.

1 Like

And of course the real high level point here is that \beta is a value between 0 and 1 and we know what happens when we compute \beta^n for increasing values or n. :grinning: Set \beta = 0.7 and give it a try: play it out on your calculator and watch what happens. And if memory serves, Prof Ng did an example like that in the lectures, didn’t he?

1 Like

@paulinpaloalto, thanks.

And yes, he does.

In part I guess I am just not accustomed to performing graphing functions where your X and Y are not explictly stated in the function. I mean your ‘Y’ here is, and your X is easily inferred. But I just wasn’t sure if I was just ‘missing’ the X in the equation.

So with a Python program performing this you’d have to force it out ‘implicitly’ I guess-- so \beta^0 = T_1 = X_1, \beta^1 = T_2 = X_2, etc.

Since this is somewhat of a ‘feed forward’ process I am also presuming you don’t necessarily have to parse the entire dataset to start graphing-- Or the future state depends on the past, but not the otherway around.

1 Like

@paulinpaloalto so I decided to take your suggestion and when I had a free moment whipped something up in Python (I need to learn/practice Pandas anyways. I’ve done a lot of work with Dataframes in R, but Pandas is new to me, but similar-- Just different syntax.

My approach:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Specify year of weather to examine
# Note: Important !
#       Due to the sequential nature of Exponentially Weighted Average algorithm
#       some years in the dataset are missing data points-- Which will cause big
#       problems ! How to deal with such cases is an interesting thought for perhaps
#       a later improvement on the algorithm application.

# Set global options

year = 2018 # The database contains years 1979 - 2020 (2020 is missing data)
beta = 0.98

# Weather measurements recorded by a weather station near Heathrow airport in London, UK
# Source: https://www.kaggle.com/datasets/emmanuelfwerr/london-weather-data

url='https://drive.google.com/uc?id=' + url.split('/')[-2]
london_weather = pd.read_csv(url)

# Add column converting daily mean temp to F
london_weather['Daily_Temp_F'] = london_weather['mean_temp'] * (9/5) + 32

# Convert 'date' column to Pandas Datetime format
london_weather['date'] = pd.to_datetime(london_weather['date'], format = "%Y%m%d")

# Subset out desired year to examine
london_weather = london_weather[london_weather['date'].dt.year == year]

# Create column to store Vt values
london_weather['Vt'] = 0

# Set V to zero for first round
Vt_minus_one = 0

for i in range(london_weather['date'].count()):
    # Get theta (aka daily temp)
    theta = london_weather.iloc[i, london_weather.columns.get_loc('Daily_Temp_F')]

    # Calc Vt
    Vt = beta * Vt_minus_one + (1 - beta) * theta

    # Store Vt
    london_weather.iloc[i, london_weather.columns.get_loc('Vt')] = np.int64(np.round(Vt, 3))

    # Set Vt_minus_one current value of Vt for next run
    Vt_minus_one = Vt

# Chart stuff
fig, ax = plt.subplots()
ax.set_title("Daily Temps in London: " + str(year) + " - β: " + str(beta))
fig.suptitle("DeepLearning.AI: Exponentially Weighted Average Example")
london_weather.plot.scatter(x = "date", y = "Daily_Temp_F", ax = ax)
ax.set_ylabel("Daily Temp °F")
london_weather.plot(x = "date", y = "Vt", color = "red", ax = ax)

I understand now the answer to my own question. Watching Prof. Ng’s example in the video the curve seemed so smooth I thought some sort of interpolation or something was going on-- But, no, it is just a side effect of having a very high β on a small enough graph with enough data points to give the illusion of such.

Note: I did not bother to implement bias correction-- Not trying to reinvent the wheel here, there are libraries for that. Just wanted to understand.

One interesting thing that did come up-- At first I thought I had an error in my formula, because all of the sudden (say on year = 2020) I was getting all these NaN’s partway through. Turns out there are some points in this dataset that have missing values.

Of course for this type of calculation that throws a wrench into the machine. I keep on thinking about creative ways to handle that, but it is worth knowing.


Very nice! The higher resolution graphs really give us a nice picture. Thanks for actually implementing this and sharing the results.

In terms of how to handle incomplete data samples, there is no real magic here. I don’t claim to have thought deeply about this, but my memory of what I think I heard Prof Ng say at some point is that you basically have two choices:

  1. Just drop (ignore) the incomplete samples.
  2. Fill in the missing values with values either from adjacent samples or the average over several adjacent samples.

One minor implementation suggestion: you can compute Vt_minus_one much more simply and efficiently as:

Vt_minus_one = Vt

That way you don’t have to evaluate the big indexed expression twice. Of course it is probably sitting in a register already and we hope the JIT compiler is clever enough to figure that out, so it may not actually make a difference in terms of execution speed. But it would definitely make the code simpler and clearer to the reader’s eye. :nerd_face:

I guess for that matter, you don’t really need a separate variable for that value: It’s just Vt. As in:

    # Calc Vt
    Vt = beta * Vt + (1 - beta) * theta

Also, sorry I can’t help myself, but it’s “Exponentially” not “Expontentially”. Once you have the programmer’s eye for detail, it’s hard to disable it. :laughing:

1 Like

Oops ! Well at least that has now been corrected/updated. [At least one knows this program was written by a human. If anything ChatGPT/Gemini/etc does not make those kinds of mistakes].

I was an ESL teacher for a good number of years so I usually tend to be pretty good at this and understand. Unfortunately, math is not my first language.

Not to say ‘all math is easy’ – it is not; But in the applied case, if you understand what it is you are looking at and how to perform the operation, you are not trying to invent anything novel. This is where I particularly like Prof. Ng’s approach of explaining things intuitively.

I feel, also, when reading papers and the like just translating the ‘hieroglyphics’ is where I most struggle. To that end I placed an order yesterday for this text to see if it helps:


It is short, cheap, and has really positive reviews on GoodReads.

I understand/follow and agree on your points regarding program structure. I actually considered first this whole algorithm can just be written recursively, but I wasn’t seeking style points.

This was kind of ‘quick and dirty’ as I am trying to spend the majority of my free time with this on actually finishing the specialization rather than on extracurriculars.

Further, I decided to structure it this way because this is more or less exactly the way Prof. Ng goes through the algorithm by hand in the lecture, so for anyone else that looked at it I wanted it to be clear in that sense rather than trying to have to interpret how the program is actually doing what it is doing.

1 Like

P.s. I did make that change as well, you were correct:

    # Set Vt_minus_one current value of Vt for next run
    Vt_minus_one = Vt

I don’t remember what I was thinking at the time why I’d actually have to pull the value from the Dataframe…