Build workflows in KNIME
ONLY MAKE AN OFFER WHEN YOU HAVE ALREDY DONE KNIME DEVELOPMENT PROJECTS!!
ALL OTHER OFFER WILL BE IGNORED.
1. Extract content
==================
Extract content from PDF documents in a folder
2. Isolate textblocks
=====================
- All textblocks in the document are identified (isolated)
- The start- and end location of each textblock in the document is identified
- A textblock can contain several paragraphs, sentences, words or a single word.
3. Label textblocks
================
- Automatically assign label to each text block, based on specific keywords for that each label.
- The specific keywords are retrieved for a table in a mySQL database. These keywords are used to see if they match the words in the textblock. If a textblock contains less then 4 words, then there must be a 100% match. If the textblock contains 4 words or more, the match van be partial. A treshold value (x%) is used. So for example if more then 85% percent of the textblock matches the keywords of a label in the mySQL table, there is a succesful match and this specific label is used for this textblock.
4. Store textblocks
===============
Each textblock is stored in a table in a mySQL database with these values:
- documentname
- label
- textblock
- startposition
- endposition