A Multimodal Approach for Semantic Patent Image Retrieval

Loading...
Thumbnail Image
Date
2021
Volume
2909
Issue
Journal
Series Titel
Book Title
Publisher
Aachen, Germany : RWTH Aachen
Link to publishers version
Abstract

Patent images such as technical drawings contain valuable information and are frequently used by experts to compare patents. However, current approaches to patent information retrieval are largely focused on textual information. Consequently, we review previous work on patent retrieval with a focus on illustrations in figures. In this paper, we report on work in progress for a novel approach for patent image retrieval that uses deep multimodal features. Scene text spotting and optical character recognition are employed to extract numerals from an image to subsequently identify references to corresponding sentences in the patent document. Furthermore, we use a neural state-of-the-art CLIP model to extract structural features from illustrations and additionally derive textual features from the related patent text using a sentence transformer model. To fuse our multimodal features for similarity search we apply re-ranking according to averaged or maximum scores. In our experiments, we compare the impact of different modalities on the task of similarity search for patent images. The experimental results suggest that patent image retrieval can be successfully performed using the proposed feature sets, while the best results are achieved when combining the features of both modalities.

Description
Keywords
Patent Image Similarity Search, Deep Learning, Mulitmodal Feature Representations, Scene Text Spotting
Citation
Pustu-Iren, K., Bruns, G., & Ewerth, R. (2021). A Multimodal Approach for Semantic Patent Image Retrieval. Aachen, Germany : RWTH Aachen.
License
CC BY 4.0 Unported