Automated Analysis and Information Extraction of Text Documents
There is a lot of scope for development in the area of information extraction from text documents. On aspect which I have been very interested in is the automated detection of plagiarism or ``similarity" between documents. My approach has been to modify and apply suitable methods drawn from the field of bioinformatics to the analysis of collections of text documents. An additional benefit is the ability to visualise and identify trends and relationships in these collections, a capability which may have many applications such as in the automatic versioning of source code.
Principal Investigators
Projects
- A Self-Organising Approach to Document Space Visualisation
- SNITCH: An open-source software package for document analysis and detection of plagiarism
Back
|