As sir mentions we need to apply normalisation here, I am wondering which normalisation is he referring to here is it Z-score normalisation or mean normalisation or simply subracting each data point with it’s mean, please explainn which normalisation would be used here and why ??

You can use any normalization that gives a zero mean value.

Prof. Andrew mentions normalizing the data to have “zero mean”. This indicates that the normalization referred to here is subtracting the mean from each feature (also known as **mean normalization**). Specifically, the idea is to center the data around zero by ensuring each feature has a mean of zero before applying PCA. PCA is based on calculating the covariance matrix of the data, and the mean of the data affects the covariance. If the data is not centered (i.e., has a non-zero mean), the principal components could end up capturing the mean of the data rather than its variance, defeating the purpose of PCA.

In addition, he talks about the possibility of scaling the data when the features are on very different scales (e.g., one feature represents house size in square feet, while another represents the number of bedrooms). In such cases, **feature scaling** could be applied, which could refer to either z-score normalization (standardizing each feature to have a mean of zero and a standard deviation of one) or min-max scaling (scaling the features to a specific range, such as [0, 1]). Feature scaling (such as z-score normalization) can be applied when features have different scales to prevent one feature from dominating the principal component analysis. This scaling ensures that all features contribute equally to the PCA.

Ultimately, while **mean normalization** (subtracting the mean) is the most common choice, **z-score normalization** is often preferred when feature scaling is necessary due to different feature scales.