Department of Linguistics - Linguistic Data Processing, University of Cologne
"Mining a Corpus of Job Ads"
Job advertisements are a valuable research object for analyzing the present, and potentially future, requirements of the rapidly changing German labour market. Due to a cooperation with the Federal Institute for Vocational Education and Training (Bundesinstitut für Berufsbildung, BIBB, Bonn) we could access an increasing corpus of several million job advertisements, containing the text of the job advertisements as well as complementary information like date of publication, job area, region etc. The aim of our research is to extract further details (e.g. required qualifications) from each job advertisement in order to enrich the database.
As a first step we built a zone analysis application that applies machine-learning techniques to split the texts and classify them in three sections: (i) employer description, (ii) job description and (iii) required qualification of the potential candidates. In order to obtain the most efficient method, we tested several thousand combinations of feature generation, feature weighting, and classification algorithms. Due to the excellent evaluation scores achieved, we can affirm that job advertisements are very well suited to zone analysis applications. We are very confident that our intended task, i.e. to extract additional information from the texts, will perform much better on our generated corpus of classified paragraphs than on the original full text corpus.
Research Interest: Jürgen Hermes is Post-Doc computational linguist, specialised in designing frameworks for text mining tasks. His research interests are virtual research environments, text classification, information extraction, and historical ciphers. He believes that the internet in combination with modern software technologies has the potential to improve a real Open Science, where scientists are able to share their publications, code, data, results, and - of course - their experiments.