Find Jobs
Hire Freelancers

database cleaning/merging/deduplication & fuzzy matching - repost

$30-250 USD

Fechado
Publicado há mais de 10 anos

$30-250 USD

Pago na entrega
I have a DataBase I'm building (excel) that contains records from many different sources. 77k rows and 50+ columns in total. I would like to condense it by unique address but keep all the other unique data cells in the rows. This will require some type of fuzzy matching as the duplicate addresses are not all 100% exact, ie: 300 Water Street suite #3 | Portland | Oregon 300 Water Street | Portland | Oregon 300 Water St | Portland | Oregon The above examples would all be the same record. Each row may have different corresponding data in the columns that needs to be condensed into one row. I have normalized the data as much as I can using my limited excel skills and powergrep. I have made sure the states, cities and abbreviations are all consistent for easier duplicate recognition. I estimate that there is probably 20k actual unique addresses, which is what this should be condensed to, but keeping all the unique cells. making a very rich data set at the end. I'm not sure if Excel can handle this type of project perhaps you have a better solution using sql or VBA Access or some other db manipulation/deduplication tool. Let me know via PM how you would best tackle this.
ID do Projeto: 5335763

Sobre o projeto

4 propostas
Projeto remoto
Ativo há 10 anos

Quer ganhar algum dinheiro?

Benefícios de ofertar no Freelancer

Defina seu orçamento e seu prazo
Seja pago pelo seu trabalho
Descreva sua proposta
É grátis para se inscrever e fazer ofertas em trabalhos
4 freelancers estão ofertando em média $183 USD for esse trabalho
Avatar do Usuário
Hi, your file would be best processed with R (programming language). I understand you need to "deduplicate" addresses only and that you have "states, cities and abbreviations" already the same for the same data (but I will check them too). I also understand that there is no data from the same client having same address, city and state to be kept (e.g. the other data is not important as soon as first unique match is found, all other rows having same address, city and state should be discarded). As final result I would return you an Excel file back and a R script so you could repeat the process if needed for similar tasks in the future (R is free to download and use). Best Regards, Matt
$177 USD em 2 dias
5,0 (33 avaliações)
5,0
5,0
Avatar do Usuário
A proposal has not yet been provided
$200 USD em 14 dias
5,0 (2 avaliações)
3,0
3,0
Avatar do Usuário
Hello, i am very experienced vb6 / vba macro coding in all Ms. Office Excel/Access/Word/Powerpoint + Vb.net 2005/2013 and will do this excellent quality for you!
$155 USD em 3 dias
5,0 (2 avaliações)
1,4
1,4
Avatar do Usuário
Hello, I would try to import the data into MySQL, and do the cleaning using DB queries. Please send me the data in order to check if this is possible.
$200 USD em 5 dias
0,0 (0 avaliações)
1,9
1,9

Sobre o cliente

Bandeira do(a) INDIA
India
0,0
0
Membro desde jan. 19, 2014

Verificação do Cliente

Obrigado! Te enviamos um link por e-mail para que você possa reivindicar seu crédito gratuito.
Algo deu errado ao enviar seu e-mail. Por favor, tente novamente.
Usuários Registrados Total de Trabalhos Publicados
Freelancer ® is a registered Trademark of Freelancer Technology Pty Limited (ACN 142 189 759)
Copyright © 2024 Freelancer Technology Pty Limited (ACN 142 189 759)
Carregando pré-visualização
Permissão concedida para Geolocalização.
Sua sessão expirou e você foi desconectado. Por favor, faça login novamente.