CNN-generated images are surprisingly easy to spot... for now

[Oct 18 2021 Update] Our method gets 92% AUC on the recently released StyleGAN3 model! For more details, please visit this link.

Discussion
Despite the alarm that has been raised by the rapidly improving quality of image synthesis methods, our results suggest that today's CNN-generated images retain detectable fingerprints that distinguish them from real photos. This allows forensic classifiers to generalize from one model to another without extensive adaptation.

However, this does not mean that the current situation will persist. Due to the difficulties in achieving Nash equilibria, none of the current GAN-based architectures are optimized to convergence, i.e. the generator never wins against the discriminator. Were this to change, we would suddenly find ourselves in a situation when synthetic images are completely indistinguishable from real ones.

Even with the current techniques, there remain practical reasons for concern. First, even the best forensics detector will have some trade-off between true detection and false-positive rates. Since a malicious user is typically looking to create a single fake image (rather than a distribution of fakes), they could simply hand-pick the fake image which happens to pass the detection threshold. Second, malicious use of fake imagery is likely be deployed on a social media platform (Facebook, Twitter, YouTube, etc.), so the data will undergo a number of often aggressive transformations (compression, resizing, re-sampling, etc.). While we demonstrated robustness to some degree of JPEG compression, blurring, and resizing, much more work is needed to evaluate how well the current detectors can cope with these transformations in-the-wild. Finally, most documented instances of effective deployment of visual fakes to date have been using classic "shallow" methods, such as Photoshop. We have experimented with running our detector on the face-aware liquify dataset from [Wang et al. ICCV 2019], and found that our method performs at chance on this data. This suggests that shallow methods exhibit fundamentally different behavior than deep methods, and should not be neglected.

We note that detecting fake images is just one small piece of the puzzle of how to combat the threat of visual disinformation. Effective solutions will need to incorporate a wide range of strategies, from technical to social to legal.

Abstract

Video

Code and Models

Paper

Acknowledgements