The upcoming Digital History Workshop will take place on 22 March between 3-5pm in the eLab room (Turfdraagsterpad 9 BG 1 0.16). In this workshop, we discuss tools that enable researchers to build their own corpus from various online resources.
The internet is rife with interesting historical documents, or vast historical collections such as Delpher, which contains a massive amount of books and newspapers. In this workshop, Dr. Martin Reynaert (Tilburg University, Meertens Institute) will demonstrate how researchers can use these digital resources to build their own corpus, tailored to their specific research needs. More precisely, he will introduce the fundamental functions of PICCL (Philosophical Integrator of Computational and Corpus Libraries) which are:
Collection and Conversion: How to convert, for example, PDF files to a machine-readable format? Or import specific sources from large databases (such as Delpher) to your own corpus?
Correction: Digitized materials often contain errors attributable to the OCR (Optical Character Recognition) software. TICCL is designed to correct such noise in the data.
Enrichment: The last step of the pipeline focusses on enriching the corpus by analysing the grammatical structure of documents or identifying named entities such as persons and places.
The workshop is open to everyone, but please register in advance by sending an email to email@example.com.