Developing a Text Information Retrieval System "project for college"

Cancelado Postado Mar 27, 2015 Pago na entrega
Cancelado Pago na entrega

Introduction

Information retrieval is the process of extracting useful information from data. In the

current era, text constitutes an important form of data. This includes web pages, emails,

SMS messages and several other text documents types.

Text documents need to be represented in an appropriate format (usually in the form

of vectors of numbers) in order to be used for further processing. Once properly repre-

sented, text documents can be used for various tasks such as classi cation, for instance,

deciding whether an email is a spam, or search, for example, deciding whether two web

pages have similar content.

Before representing documents as numbers, however, they must be preprocessed. Text

preprocessing is the tasks of removing unnecessary information from the text. This is

achieved through several steps, which are summarized hereafter

1. Initial preprocessing: The goal of this step is to "clean up" the document and

prepare it for the remaining tasks. The di erent tasks conducted in this step are:

(a) Replace tabulation, return and new line by space.

(b) Remove all non-letter characters: turn punctuation, numbers, etc. into spaces.

(c) Switch all letters to lowercase.

(d) Substitute multiple spaces by a single space.

(e) Remove words that are shorter than 3 characters long. For example, remove

"an" but keep "him".

2. Stop words removal: Some words such as "a", "the", "and" are very common in

English and should be removed from the text in order to only leave useful words.

This task is simply done by removing any word that appears in a prede ned list of

stop words.

3. Stemming: The same word can take di erent forms depending on its role and

position in the sentence.

Java Arquitetura de software Teste de Software

ID do Projeto: #7380454

Sobre o projeto

9 propostas Projeto remoto Ativo em Mar 31, 2015

9 freelancers estão ofertando em média $139 nesse trabalho

dobreiiita

Hello I am Java expert and interested in this project. I have reviewed your requirements and confident to handle this project perfectly. Please communicate to discuss further. Regards Anshu

$100 USD em 1 dia
(416 Comentários)
7.4
mibrahim070

I have much experience in java and problem solving, i can finish fast, open chat for more info ,

$70 USD em 1 dia
(18 Comentários)
4.1
klochkovg

If this overview describe most of the task and there is no additional limitations on memory footprint, performance etc. I don't see any difficulties to implement it. Some additional information about environment is n Mais

$60 USD in 7 dias
(2 Comentários)
1.5
jvison14

Im a java specailist like to do this for you...pls send me more details of the specificaiton..it seems not complete you can contact me through mail skype gtalk pererabdi

$244 USD in 5 dias
(0 Comentários)
0.0
matthewwilliam

Hi, I'd be happy to help you with your project. I'm a Mechatronics Engineer who specialises in software development for the high-tech industry, so I'd be perfect for your job. Please contact me for more information. Mais

$155 USD in 3 dias
(0 Comentários)
0.0
raeesi

I have experience in designing and implementing Search Engines on various text corpus and I am willing to help you in this project. By the way, I suggest you to use lemmatization instead of stemming.

$256 USD in 5 dias
(0 Comentários)
0.0
yulaili

7 years work experience Be good with java and json program I'm staff software engineer hope you can contant me

$155 USD in 3 dias
(0 Comentários)
0.0
gopikrishna92

Have good skills on java. Don't Worry your project will be delivered. I already visualized your project in my mind whether i got this bid or not i will complete this project.

$155 USD in 3 dias
(0 Comentários)
0.0
rbob

Hi, I've worked on this kind of project earlier for one of the projects that I made in college. I can deliver this project as per the timeline for the specified amount of fee, assuming that you expect the applicatio Mais

$55 USD in 2 dias
(0 Comentários)
0.0