According to this video, if we know the d distance in z space, we can alter any feature.
If it’s true why don’t we have the table for such values of d of respective features so be can change particular features of our image i.e change tail of a dog, change nose, etc .Value’s of d of particular features?
is it true?
please avoid hyperlinks.
Thanks.
Hi, @starboy,
Of course, we have something like a table for directions of different features.
However, a direction in the Z space is not good enough for those advanced generative networks, such as StyleGAN 2
, since Z space is usually entangled, and W space or S space are preferred because of their disentangled characteristics. (Knowledge about Z space and W space are covered in the lectures )
Thus, for direction in the W space of StyleGAN 2
, you can check InterfaceGAN
. (I do not know what do you mean by requiring “avoid hyperlinks”, but InterfaceGAN
is not something that can be fully explained here. You need to read their paper to fully get their ideas.) Brief summary here is that they use SVM
, which distinguishes generated images of a certain feature, such as wearing glasses, or not. The decision boundary of SVM
is presumed to be a hyperplane in the W space which separates all w into two categories: has certain feature, such as wearing glasses, or not. Its normal vector is assumed to be a direction of feature.
Additionally, the academy has investigated this issue, namely, altering specific features while preserving identity, thoroughly, and go much beyond Linear Algebra in the W space to better disentangle features and avoid laborious work involved in the InterfaceGAN
, producing several excellent work. (F.Y.I, since you do not want hyperlinks, I will only mention their name here: StyleFlow, StyleCLIP)
Thank you @kevinjiang . Actually, I usually say avoid hyperlinks because they give you brief information. Like you mentioned the names now. I can search the papers related to interfaceGan in the near future when I have more time.
I’d like to add that the GAN model is being trained in a completely unsupervised manner. Even though we can disentangle the vector space Z we still have to manually map all the latent features. What’s more, every training will likely map the features differently.