AI Document Digitalization

1Computer Vision Center, 2Autonomous University of Barcelona, Equal Contribution

Pipeline

Our work focuses on converting complex physical documents into an editable digital format. The figure below outlines our pipeline, which separates textual and graphical elements before processing them.

Pipeline Diagram

Dataset Creation

To train our models, we created a dataset composed of real and synthetic documents. The synthetic dataset enables robust learning by simulating various real-world document conditions.

Dataset Creation

Examples from our synthetic dataset include variations in lighting, added graphical elements, and text obfuscation to mimic real documents.

Dataset Examples

LayeredDoc

We implemented LayeredDoc to separate textual and graphical elements from scanned documents. This allows better processing by OCR and object detection models.

LayeredDoc Inference

DocTR (OCR)

To extract text from the document, we used DocTR, a state-of-the-art OCR model that detects and recognizes text with high accuracy.

OCR Results

Table Structure Recognition

For table detection and structure extraction, we used TATR, a model trained to identify and correctly format tables within a document.

Table Structure Recognition

Inference of the Whole Pipeline

Here is an example of a complete document processed through our pipeline, integrating all models to generate an editable output.

Final Pipeline Example

Demo Video

A demonstration of how to use our web application will be shown here.