We require a freelancer to data mine the Wikipedia categories entitled "Natural Science", "Physical Science", "Formal Science" and "Applied Science", for:
1) All Pure and Applied Scientific Discipline page_titles (fields of study, for example: Physics, Biology, Mechanics, Evolutionary biology, Astronomy, Thermochemistry, Surgery, Electronic engineering)
2) The first paragraph of the article in question (omitting any Greek, Latin or other etymological information)
3) The category type (NATURAL, FORMAL, OR APPLIED). Note that all disciplines under the Category: Physical science must be considered as NATURAL.
4) The relationship of each page_title with all other page_titles - For example: Physics is a parent discipline of Mechanics. This is crucial to building a structured hierarchy.
Note - We do not want the following:
a) Pseudosciences (i.e., Fringe sciences such as Astrology, Telekinesis, Sci-Fi Sciences and technologies etc)
b) Social sciences (Cognitive science, History, Economics, Sociology, Psychology, Cognitive Science, Philosophy, etc)
c) Any people (scientists, professionals, historical figures etc)
d) Any Opinion pieces and/or other articles that do not conform to Wikipedia's Neutral Point of View (NPOV) - see [login to view URL]:NPOV_dispute
Data must be obtained as per the fields described in the attached WIKI_DATA_OUTPUT excel file.
A rudimentary decision tree entitled "Pure and Applied Disciplines Decision Tree", describing several analysis criteria and methods for determining whether or not a page_title is a scientific discipline, has been provided, along with the supporting keyword library excel files ("Pure Libraries" and "Applied Libraries"). We recommend that the freelancer download the XMind software to visualize this tree and get a better idea of how we want this information to be gathered.
We leave it up to the freelancer whether to follow our decision tree, or instead to disregard it in favor of other data mining methods. We of course welcome any and all suggestions for more accurate analysis criteria and methods, should the freelancer find more suitable and performant alternatives.