AI Document Digitalization

¹Computer Vision Center, ²Autonomous University of Barcelona, ^⭐Equal Contribution

Pipeline

Our work focuses on converting complex physical documents into an editable digital format. The figure below outlines our pipeline, which separates textual and graphical elements before processing them.

Dataset Creation

To train our models, we created a dataset composed of real and synthetic documents. The synthetic dataset enables robust learning by simulating various real-world document conditions.

Examples from our synthetic dataset include variations in lighting, added graphical elements, and text obfuscation to mimic real documents.

LayeredDoc

We implemented LayeredDoc to separate textual and graphical elements from scanned documents. This allows better processing by OCR and object detection models.

DocTR (OCR)

To extract text from the document, we used DocTR, a state-of-the-art OCR model that detects and recognizes text with high accuracy.

Table Structure Recognition

For table detection and structure extraction, we used TATR, a model trained to identify and correctly format tables within a document.

Inference of the Whole Pipeline

Here is an example of a complete document processed through our pipeline, integrating all models to generate an editable output.

Demo Video

A demonstration of how to use our web application will be shown here.