Find Jobs
Hire Freelancers

Need PDFBox expert to help extract text from pdfs with coordinates and a flag what part of text is visible

$250-750 USD

Fechado
Publicado há mais de 6 anos

$250-750 USD

Pago na entrega
I am looking for help understanding the PDFBox library. Please apply only if you already worked with PDFBox or iText or other PDF software. What we need: Utility/jar/class we can call from our java WebApp which is running on Linux server (this may affect non-java solutions) under Tomcat with Java 8. Problem: we need to extract text from searchable PDF (not scanned) and preserve text positions - so ideally lib should return words/tokens with x/y start/end positions as well as start/end coordinates of vertical and horizontal line separators. We need to get only the text a user can see; or if we get full text, we need a clear understanding what part of text is visible to the end-user and what part of text is not-visible. Attached is an example of a pdf file that has hidden text. We tried Apache PDFBox, however, default PDFTextStripper handles only simple cases, when all extracted text is visible on screen. There are attached files where text is partially invisible because of PDF clipping/filling paths, so to track it, you need manually process PDF instructions and calculate if character is not covered/overlapped by another element, like image, other filled field etc. So we would like to get only the text a user can see; or if we get full text, we need a clear understanding what part of text is visible to the end-user and what part of text is not-visible. There are some others tools could be used, like iText, Tika, but looks like they are built on top of PDFBox. Also we considered using Acrobat SDK but we are not familiar with it.
ID do Projeto: 15915061

Sobre o projeto

6 propostas
Projeto remoto
Ativo há 6 anos

Quer ganhar algum dinheiro?

Benefícios de ofertar no Freelancer

Defina seu orçamento e seu prazo
Seja pago pelo seu trabalho
Descreva sua proposta
É grátis para se inscrever e fazer ofertas em trabalhos
6 freelancers estão ofertando em média $517 USD for esse trabalho
Avatar do Usuário
Greeting, I have understood your Need PDFBox expert to help extract text from pdfs with coordinates and a flag what part of text is visible task and can do it with your 100% satisfaction. Please ping me for more discussion. I have more than 5 years of experience in Java, PDF
$500 USD em 6 dias
5,0 (21 avaliações)
5,0
5,0
Avatar do Usuário
Hi, I have huge experience in PDFbox & iText PDF library, i reviewed your requirement for extracting text from PDF and it's position is looking good to me as it's searchable PDF so we can get the text easily, for getting position of text in Page i can get the X & Y coordinates of the text in that page. I don't think need to use the Adobe SDK, PDFBox, itext is enough for this task. If you want i know another library called tableu which will handle this. If you you have time can we connect on chat so i can ask you few question to get my understanding clear and make sure we both are on the dame page. Thanks,
$480 USD em 10 dias
4,7 (14 avaliações)
4,9
4,9
Avatar do Usuário
Hey man , I have worked on PDF box library, I have seen your document and I can try to do it, if interested, message men Thanks
$690 USD em 10 dias
5,0 (10 avaliações)
4,2
4,2
Avatar do Usuário
I am an IITK graduate and I have 11 years of experience in software development. I have 100% completion rate and I have finished projects with the highest level of customer satisfaction. I have a team of rock star developers, who are working with top product companies and contribute to these projects as part time gig.
$555 USD em 10 dias
3,8 (20 avaliações)
5,4
5,4
Avatar do Usuário
Hello Sir/Mam Relevant Skills and Experience: Please send us all details and we will do the job now if possible...and we are always ready to take any challenge + we have an adobe lab too Proposed Milestones: 475 - (ProjectTitile) For any query please consult our profile on https://www.freelancer.com/u/benni25.html
$475 USD em 1 dia
4,9 (5 avaliações)
3,1
3,1

Sobre o cliente

Bandeira do(a) UNITED STATES
United States
0,0
0
Membro desde dez. 20, 2017

Verificação do Cliente

Obrigado! Te enviamos um link por e-mail para que você possa reivindicar seu crédito gratuito.
Algo deu errado ao enviar seu e-mail. Por favor, tente novamente.
Usuários Registrados Total de Trabalhos Publicados
Freelancer ® is a registered Trademark of Freelancer Technology Pty Limited (ACN 142 189 759)
Copyright © 2024 Freelancer Technology Pty Limited (ACN 142 189 759)
Carregando pré-visualização
Permissão concedida para Geolocalização.
Sua sessão expirou e você foi desconectado. Por favor, faça login novamente.