On October 5th and 6th, 2022, data scientists from the DataLab Group participated in Confiance.AI Days, an event focused on promoting responsible AI. During the event, researchers and Data&AI  experts presented  posters showcasing their work towards responsible use of AI. Timothée Fronteau and Arnaud PARAN, from the DataLab Group, presented their research on adversarial attacks. Trying to understand the threat that adversarial attacks pose to the multiple models used in document processing and trying to make models robust to those attacks. In that work, we displayed the fact that similar to natural images, documents are sensitive to adversarial attacks, but adversarial training can lead to robust generalization.

Adversarial attacks pose a major threat to all machine learning models. For that reason, it is important for the DataLab Groupe to work on that topic and participate to scientific research.

We realized that no works have explored adversarial attacks on documentary data, however, critical systems use AI for document processing around the world nowadays, hence we decided to work on the topic.

We adapted natural images techniques and showed that those can generalize easily to documentary use cases. However, we show that if the adversarial examples are close to the original example (in terms of l_2 or l_inf distance), then we can adversarially train a model and make it robust to attacks.

Thhis result is interesting as it gives a simple way to defend against adversarial attacks if the attacks are indistinguishable from legitimate documents by humans.

You can find here the poster that was displayed explaining the process that we went through during the experiments in order to prove the results we obtained.