A hierarchical representation of form documents for identification and retrieval

In this paper, we present a logical representation for form documents to be used for identification and retrieval. A hierarchical structure is proposed to represent the logical structure of a form by using lines. The approach is top-down and no domain knowledge such as the preprinted data or filled-in data is used. Logically same forms are associated to the same hierarchical structure. This representation can handle geometrical modifications and slight variations.