Hello,
I am wondering about the current and future abilities of unpixelisation software.
I’d seen this video which made it look like it’s very easy to unpixelate text when in a video, I believe having multiple frames and movement helps make it easier: https://www.youtube.com/watch?v=acKYYwcxpGk
What I’m wondering is whether this applies to human faces also when in videos? So for example let’s say I intentionally pixelate my face like this and create 4k videos of me talking, moving my head around slightly etc looking similar to this.

To what extent could the image be unpixelated and the face of the person revealed? I think if there is lots of movement and the image is 4k it will help but how close to the actual person underneath can you get? It seems with text you can get it very exact but with a persons face is this harder? I know you can get AI to have a guess at a pixelated face but it generally will only be that, an estimate based on info. For example I got it to unpixelate the above and the image it came up with was a fair bit away from the original.
Gemini tells me there is software which can depixate videos/images but I tried some of its suggestions and none of them worked very well, possibly as I wasn’t using them correctly.
Let me know what you guys think is possible here? Or if it’s likely we’ll be able to fully depixelate images/videos like the above in the near future with AI advances.
Any help would be much appreciated!
Thanks
Hi,
This is a really good question—and you’re thinking about it the right way.
The key point is this: true “unpixelation” (perfect recovery) is only possible if you know how the image was originally encoded or degraded. Pixelation removes information by averaging or blocking pixels, so once that information is lost, it can’t be exactly recovered unless you can reverse the exact process used.
For text in videos, it sometimes looks “easy” because:
- Multiple frames provide slightly different information (sub-pixel shifts, motion)
- Text has strong structure (edges, known shapes, fonts)
That combination makes reconstruction much more reliable.
For faces, it’s a different story:
- Faces are highly complex and variable
- Pixelation destroys fine identity-specific details
- Even with multiple frames, you usually don’t recover the exact original face
What modern AI (especially Generative Adversarial Networks (GANs) and similar models) can do is:
- Learn patterns from large datasets of faces
- “Fill in” missing details based on probability
- Produce a plausible face that matches the blurred input
But this is reconstruction, not recovery. The output may look realistic, but it’s essentially the model’s best guess—not the true original person.
If you want better results in a specific application, you could:
- Train a GAN or similar model on a very specific dataset (e.g., same camera, same types of faces, same conditions)
- Use multiple frames from video to improve consistency
- Combine super-resolution + temporal models
Even then, you’ll only get probabilistic reconstruction, not exact identity recovery.
So in short:
- Exact depixelation → only possible if you know the encoding method
- AI-based depixelation → possible, but it guesses rather than truly recovers
- Faces → much harder than text, and unlikely to be perfectly reconstructed even with future AI
Thanks very much for the very detailed answer, I really appreciate it!
That all makes sense to me, there’s just one bit I want to ask you about.
When you say:
*
true “unpixelation” (perfect recovery) is only possible if you know how the image was originally encoded or degraded. Pixelation removes information by averaging or blocking pixels, so once that information is lost, it can’t be exactly recovered unless you can reverse the exact process used.*
Let’s say I’m using a fairly standard pixelation to mask it, It uses a standard, single-pass mathematical grid to snap coordinates to a specific pixel size. If they know (or guess) I’ve used this standard method does this mean they can decode it? I’d assume not, and I’d assume you can probably tell by looking at the pixelated video roughly what method is used. Just wanted to clarify this bit.
Secondly, someone had mentioned “layered pixelation” was harder to reverse. Let’s say instead of using the standard single pass method above I want to use somthing harder to crack. I modify my code to sample the image multiple times on overlapping, offset grids.
I do this by modifying my pixel shader to average out two layers of pixelation (one standard, one offset by half a pixel)
Is this likely to be harder to unpixelate?
Again if someone knows I’ve used this method can they easily reverse it. I’m assuming even if someone knows how I’ve pixelated it, if the original pixels are now missing from the video that they stuck trying to fill in the blanks and get close to the original?
Again, thank you very much for your help with this!
In general, whether pixelation can be reversed depends on how it was done. If the method is known, it may be possible to approximate the original image. If it’s a standard method but unknown, one would have to guess—and only a correct guess would work.
However, if the image was altered using more complex techniques, like adding noise with GANs (Generative Adversarial Networks), reversal becomes much harder. The noise is random and not easily inferred, so recovering the original would typically require a model trained on similar data or knowledge of the process.