In recent years, Multispectral (MS) imaging becomes a very important tool for historical document analysis. This technique is widely known as a non-invasive method of investigation thanks to its simultaneous use of ultraviolet, infrared and visible light. It enables conservators and art historians to obtain valuable information on ancient documents without causing any physical damages to the materials, and makes it possible to reveal the newly overwritten text, to distinguish and recognize the chemical material composing the ink, to detect signs of degradation in historical documents. It can also help to extract information from cultural heritage patterns which cannot be extracted using conventional color photography. Extracting (segmenting) the original text (old writing) from MS document image is a very important step for subsequent document image analysis and investigation. In order to facilitate comparison of the results of different algorithms and track their progress over time towards a level of human performance, it would be of great interest to obtain a standard benchmark and accurate ground-truth with the most representative information about the targeted samples. To this end, we have generated a dataset of 21 MS document images for training purposes (http://www.synchromedia.ca/databases/msi-histodoc), and another dataset of ten MS document images for testing purposes.
|