Data Mining From Digg (by script)

Em Andamento Postado Oct 4, 2008 Pago na entrega
Em Andamento Pago na entrega

Data Mining Task from Digg

You'll be supplied with a list of movie titles.

Your task is to gather the following data

- List of Digg Submissions related to the movie based on the following search terms:

a. search for ||movie_name movie||

b. search for ||movie_name film||

c. search for ||movie_name trailer||

d. search for ||movie_name watch||

e. search for ||movie_name see||

IE for the movie "The Eye" you will run the following separates searches

"The Eye movie"

"The Eye film"

"The Eye Trailer"

"The Eye watch"

"The Eye see"

All *without* the double quotes!!

All searches should be combined and duplicates deleted (delete only exact duplicates, that leads to the same digg submission, not the same external URL)!

Digg search settings: "Title, Description, and URL", "All Stories". "Including burried: NO"

The results should be saved in a table (preferably excel, CSV is also possible) with the following data

ID (auto increment Serial Number), Date Submitted (dd/mm/yyyy), Title, Full URL of DIGG Item, FULL URL ITEM IS LINKED TO, number of diggs, number of comments, Made Popular(YES/NO)

Please note that the date appears on digg as a relative date (ie 2 years 34 days ago). This should of course be converted to the exact data).

Made Popular: Regular diggs (not popular) shows the following text on search result: "username" submitted "342 days ago"

Popular items shows the following text instead: "Username" made popular "342 days ago"

Sample data attached. Please make sure you understand the requirements before posting your bid.

I expect this to be done, as accurately as possible by script (automatically) and in 2-3 days.

Processamento de dados Python Pesquisa Ruby on Rails

ID do Projeto: #324462

Sobre o projeto

17 propostas Projeto remoto Ativo em Oct 9, 2008