can i use vision transformer insted of cnn in encoder in VAE beacuse the VIT extract more information rather than cnn
yes you can use, only you need to make sure convert the images into tensor type.
Here is the link related to that, probably you already have seen
as far my understanding VIT iS VAE MODEL using the transformer encoder.