How to visualize a multi-feature regression with vectors?
During week 1 we could plot the x and y to see the distribution of the training data and later overlay the w and b to understand the logic behind the model’s prediction.
How do visualize it in a multi-feature regression?
Welcome to our community! Would you please share where we can find that plot? (e.g. which timestamp in which video, or which section in which lab) I want to take a look first so as to help make sure we are on the same page.
Pretty much, you can’t. Since there can easily be more features than the human visual system can use, making plots of the data set isn’t usually part of the solution.
Instead, we can plot the history of the cost vs. the number of iterations, to get a sense of whether the solution is converging.
Thanks for the answer Raymond, I would like to understand how to visualize something similar to the below chart for a multi-feature regression.
The x-axis is the feature, the y-axis is the target, the red crosses are the training data, and the blue line is the result of the “wx + b”. This was helpful for me to understand how the model is getting to the prediction.
In the diagram you posted here, it shows the relationship between price of house and the size of the house in a 2_D representation (X, Y). When we add another feature, say number of rooms in the house, the diagram will become 3_D representation (X,Y,Z) which will be a plane. The human eyes are comfortable to see 3_D, but imagine how your brain could interpret higher dimensions? There are techniques to visualize high dimension datasets, such as T_SNE, PCA etc, and here is a link about such techniques.
If there are two features and a ‘y’ value, you need a 3D plot. That’s feasible.
If there are more than two features, you need more than 3 dimensions. That’s not feasible.
I think you have agreed with that we can’t replicate the way you have plotted the housing prices to plot more than 2 features and 1 target. If you are ready to go creative, I recommend you to google for how others do it, and see if it fits your actual use case (there is no general way for every problem).
For example, I started with searching “visualize high dimensional data” on Google Image, then came up with one example of using “Parallel coordinate”. It can host as many features as you want, and there is developed package for the job so you don’t need to code a lot. However, the challenge here is to train yourself and your audience to get used to the visualization, and to prepare for a story that makes the most of the visualization.
Feel free to share the finding of your search with us
One other point: you can of course also plot „in 3d“ the results of a regression model as a function of the the first two Partial Least Square components or Principal Components or who carry most of the information as you can see here step by step:
In this case you can add at least one more dimension compared to your plot. This can also be helpful in addition to the previously mentioned scatter_matrix approach, @pteixeira.
In multi-feature regression, we can use a scatter plot matrix to visualize the relationship between multiple input features and the target variable. A scatter plot matrix is a grid of scatter plots where each scatter plot displays the relationship between two different input features and the target variable is represented by color or size of the data points.
We can also use a 3D scatter plot to visualize the relationship between two input features and the target variable. In a 3D scatter plot, the two input features are represented by the x and y-axis, and the target variable is represented by the z-axis. The data points are plotted as a set of points in the 3D space.
Another way to visualize multi-feature regression is to use a vector plot. In a vector plot, we can represent each data point as a vector with the length of the vector representing the value of the target variable and the direction of the vector representing the values of the input features. This can be helpful in understanding the direction and magnitude of the relationship between the input features and the target variable.
Matplotlib: a widely used plotting library for Python that can create scatter plots, 3D plots, and vector plots.
Seaborn: a Python visualization library that builds on top of Matplotlib and provides additional functionality for creating scatter plot matrices and other types of plots.
Plotly: a web-based data visualization tool that can create interactive 3D plots and scatter plot matrices.
Tableau: a data visualization software that can create various types of plots, including scatter plot matrices and 3D plots.
But thanks for clarifying that you mean apparently a three dimensional scatter plot. Please note that this does not really suit your previous explanation here in my opinion:
Please think about to modify or clarify this cited section in order not to confuse our fellow learners in this forum with misleading information.
Thanks in advance, @Sidahmed_Belkadi!!