09/22/2023 | Press release | Distributed by Public on 09/21/2023 22:01
Developing image-based AI models capable of tasks such as recognition, detection, and segmentation requires a significant amount of time and cost to collect large amounts of data and prepare training data through annotations. This is a major issue in the social implementation of AI.
Therefore, in recent years, self-supervised learning has been actively developed as a method to significantly reduce the annotation load. Self-supervised learning uses pseudo labels generated by the AI itself in advance from a large amount of unlabeled data to learn image features, then achieves the desired task with high precision using a small amount of data for each task. SimSiam, SimCLR, and DINO are well-known conventional methods.
When pre-training features that appear in images from a large amount of unlabeled data, to obtain a general feature representation that can be applied to a variety of tasks, AI needs to learn so that it can recognize a given object even if the object appears in various different forms, such as when the image is cropped or rotated, or is lighted differently.
With self-supervised learning such as the aforementioned SimSiam, image augmentation such as rotation, cropping, and color conversion are automatically performed on each image, and the distances between image features of these augmented images are calculated automatically. In the pre-training phase, AI is trained by minimizing the distance of image features of the given object, which enables AI to recognize the object as the same given object even if it looks different. It is known that with SSL can be used for various tasks with high precision with a small amount of labeling.
However, conventional self-supervised learning does not take into account the properties of each image when learning.
Images with high uncertainty and images with low uncertainty are treated the same way, which may cause issues in the pre-training phase or accuracy of the model. Panasonic HD attempted to solve this problem using a probabilistic statistical approach. Probabilistic generative models such as Variational Auto Encoder are known to be good in expressing uncertainty. In this paper, we demonstrated that the formulas used in conventional self-supervised learning can be derived from the formulas of this Variational Auto Encoder, and theoretically clarified the relationship between them (Figure 1).
Furthermore, we developed a method that can estimate the uncertainty of each image in datasets. In an evaluation experiment on ImageNet100 (benchmark dataset), we qualitatively demonstrated that our method was able to estimate the uncertainty of images (Figure 2), and we obtained quantitative findings that, in classification tasks, images estimated by this method to have high uncertainty tend to have a low percentage of correct answers, which indicates uncertainty affects the recognition rate of AI (Figure 3).
Until now, it has been common knowledge that a large amount of high-quality data is required for AI training data, but our research showed that the quality of training data may be treated as an uncertainty. We were able to demonstrate the possibility of realizing AI that can overcome the hurdle of data quality by incorporating estimated certainty into the AI algorithm.