I am looking for someone with data mining skills that can:
• Locate and mine appropriate data from the web to build a database containing liquor products
• Database must contain:
o EAN/UPC code - Number
o Name - Text
o Description – Text
o Large Image (VGA quality or above) – jpg format
o Small thumbnail image - jpg format
o Volume (in milliliters) – number
o Liquor Type - text
o Source (where was it mined from) – text
o Full Weight (set at 0 for initial DB – no need to mine for this) - number
o Empty Weight (set at 0 for initial DB – no need to mine for this) - number
o Last Updated – date
o Issues (will be a text field that highlights fields with issues – see below). Initially blank.
o Verified (text - Yes/No) – Default set to “No” for initial database
• Must all be in English
• Worldwide database is desired however primary markets to gather data from is US and Europe (UK and Germany mainly)
• Sources must be declared. All efforts must be made to ensure that data is not infringing copyright (the intention is to re-use data elsewhere). For example, [url removed, login to view] is good source however explicitly forbids re-use. Need “clean” data
• Database is mysql format
Please give me your thoughts on how you would proceed to carry out this work as part of your response.
In addition to the above, as a second stage, I would like to build a basic android app for a tablet that is used for data collection around this database. The primary purpose of this app is to clean up the database after it has been mined. So only prototype quality app is required (doesn’t need to be too pretty) – but still must be robust and reliable.
You will be given freedom of layout and flow of the app but keeping in mind that minimizing the number of clicks is important. Speed of data-collecting is very important.
The app will do the following: -
• The database is to be stored in the cloud (please outline your method for this). A local cached copy is also stored in the app. Any new data is pushed to the cloud when committed.
• The app will use the tablet camera to scan the 1D barcode using [url removed, login to view]
• App will look up database and display the data available under that UPC code (photo, description etc) one at a time. Each field is checked (including images) with a confirm button stage
• App will highlight fields missing and begin to gather that information. For example, if photo is missing, then camera is enabled to take photo
• Full weight and Empty Weight will be filled in as a mandatory field. Number in milliliters.
• If all fields are filled out correctly, in the correct format (the app needs to check the format of each and every field for correctness, then the Verified Yes can be set (otherwise stays at No).
• If UPC code is scanned and Verified is already yes, then an alert to flag that it is already verified must be shown. An option to override is then presented which will then also the user to go through each field again and make changes as needed.
• There is to be a “session” where a person is building and verifying the data. This session shall create a log of each scan and updated to any record is to be kept and able to be emailed to a recipient.
• Updated information is cached locally and tagged for upload as soon as internet connectivity is available
I see this as two separate jobs – 1) initial data mining and 2) prototype app development. I would much prefer to have this work done by one single person if possible, but will separate the jobs depending up the responses. If you are an app developer that hasn’t had experience with data mining, then please don’t use this as an opportunity to learn (you may still get the app work only). Vice versa if you are a data miner – don’t try to become an app developer if you are not already one. I’ll be asking many questions before awarding. Trust is important here.
Looking forward to hearing from you