Find Jobs
Hire Freelancers

database cleaning/merging/deduplication & fuzzy matching - repost

$8-15 USD / hour

Fechado
Publicado há mais de 10 anos

$8-15 USD / hour

I have a DataBase I'm building (excel) that contains records from many different sources. 77k rows and 50+ columns in total. I would like to condense it by unique address but keep all the other unique data cells in the rows. This will require some type of fuzzy matching as the duplicate addresses are not all 100% exact, ie: 300 Water Street suite #3 | Portland | Oregon 300 Water Street | Portland | Oregon 300 Water St | Portland | Oregon The above examples would all be the same record. Each row may have different corresponding data in the columns that needs to be condensed into one row. I have normalized the data as much as I can using my limited excel skills and powergrep. I have made sure the states, cities and abbreviations are all consistent for easier duplicate recognition. I estimate that there is probably 20k actual unique addresses, which is what this should be condensed to, but keeping all the unique cells. making a very rich data set at the end. I'm not sure if Excel can handle this type of project perhaps you have a better solution using sql or VBA Access or some other db manipulation/deduplication tool. Let me know via PM how you would best tackle this.
ID do Projeto: 5352074

Sobre o projeto

3 propostas
Projeto remoto
Ativo há 10 anos

Quer ganhar algum dinheiro?

Benefícios de ofertar no Freelancer

Defina seu orçamento e seu prazo
Seja pago pelo seu trabalho
Descreva sua proposta
É grátis para se inscrever e fazer ofertas em trabalhos
3 freelancers estão ofertando em média $12 USD/hora for esse trabalho
Avatar do Usuário
I have done this exact type of exercise with clients in the past. Usually, they'd have dozens of different spreadsheets and the columns weren't all in the same positions. I know of a few quick tricks in Excel to clean it up as much as possible through formulas in a new column and after that, I go through everything line by line to ensure that I haven't left in any duplicates that couldn't be captured by my formulas. It took me a little while to figure out a good process for doing this, but since I've had to do it so much, I tend to be able to do it a little more quickly. I can definitely get the data cleaned up for you and put it in Access, which is going to store data more efficiently than in Excel. I can keep it in Excel if that's what you're set on. In Access, I would also set up a form to allow you to easily add new records without messing any of the existing data up by accident. I already have a template for this. Just let me know if there's anything other than cleaning duplicates or similar data records and I can work on this right away.
$14 USD em 3 dias
0,0 (0 avaliações)
0,0
0,0
Avatar do Usuário
Hello. My name is Jason and I've been working in the IT department of a large US company for the last 20 years. During that time I've created many in-house applications to automate and customize excel data. With yours, I'd be able to parse out similar (fuzzy) data strings so that St and Street are the same. I could also set it so that it will automatically know that 300 Water Street and 300 Water Street suite #3 are the exact same place. With .Net/c# and Excel automation I can have this done very quickly. If you have any questions, please email me and I will help you out. Thank you for your time.
$12 USD em 3 dias
0,0 (0 avaliações)
0,0
0,0

Sobre o cliente

Bandeira do(a) INDIA
India
0,0
0
Membro desde jan. 23, 2014

Verificação do Cliente

Obrigado! Te enviamos um link por e-mail para que você possa reivindicar seu crédito gratuito.
Algo deu errado ao enviar seu e-mail. Por favor, tente novamente.
Usuários Registrados Total de Trabalhos Publicados
Freelancer ® is a registered Trademark of Freelancer Technology Pty Limited (ACN 142 189 759)
Copyright © 2024 Freelancer Technology Pty Limited (ACN 142 189 759)
Carregando pré-visualização
Permissão concedida para Geolocalização.
Sua sessão expirou e você foi desconectado. Por favor, faça login novamente.