Find Jobs
Hire Freelancers

Scala Spark Project to Group Values stored in MySQL and CSV.

£20-250 GBP

Fechado
Publicado há mais de 3 anos

£20-250 GBP

Pago na entrega
All, I need two spark jobs written in Scala to do some groupings on data stored in: Note: Both have the same data stored * Note: For the MySQL, please use SPARKSQL. Note: Please use SBT as build tool for the project (I am happy to provide an example spark-project built with scala, sbt so you can use that as starting point without worrying with all the dependencies and issues) Job #1) Reads from MySQL database and saves output to MySQL database Job #2) Reads from a given directory containing one or more CSV file(s) and saves output to a CSV file in a new directory location. Input of MySQL: [login to view URL]!9/137127/1 Input of CSV: (Attached as: [login to view URL]) I expect that once I run the spark job, it will give me the following output: Output of MySQL: [login to view URL]!9/802f692/1 Output of CSV: (Attached as: [login to view URL]) The groups need to do the following: LOGIC: A: IF field column "message_originator" and "account_id" are same and IF the column value of "start_ts" is in the SAME DAY (unix_epoch_timestamp).... then output 1 row adding the column values of "msg_parts_vol" and "whole_msg_vol" and "freq". Think of it as a summary row of one-or-many-rows. B: In the output result, please use the beginning of date (unix_epoch_timestamp) in the "start_ts" column. C: The spark job should take in two args: (1: start_ts) and (2: end_ts) and then when it queries the MySQL or CSV, it should only process group rows for which "start_ts" falls within that unix_epoch_timestamp. D: The spark job should be able to take in a fixed argument: (eg: LAST_3_DAYS) and it should be able to convert that automatically to unix_epoch_timestamp of current time minus 3 days to work out "start_ts" and "end_ts". To explain this further for example: if at the time of spark job running command current time is: 2020-11-09 13:15:00 UTC and we use "LAST_3_DAYS" then "start_ts" should be (1604620800) which is 2020-11-06 00:00:00 UTC and "end_ts" should be (1604880000) which is 2020-11-09 00:00:00 UTC E: The spark job once it is completed, should store the final rows "end_ts" value in a new table. e.g: [login to view URL]!9/e15c54/1 or if it is CSV then like (Attached as: [login to view URL])
ID do Projeto: 28092554

Sobre o projeto

10 propostas
Projeto remoto
Ativo há 3 anos

Quer ganhar algum dinheiro?

Benefícios de ofertar no Freelancer

Defina seu orçamento e seu prazo
Seja pago pelo seu trabalho
Descreva sua proposta
É grátis para se inscrever e fazer ofertas em trabalhos
10 freelancers estão ofertando em média £168 GBP for esse trabalho
Avatar do Usuário
Hi, Thank you for invite. I am a certified developer and designed and developed many enterprise level applications using spark. Please ping me in chat we Will discuss. Thank you, Naresh.
£150 GBP em 1 dia
5,0 (9 avaliações)
4,4
4,4
Avatar do Usuário
Have done similar work mate, will be happy to help you here again. do we need to use pyspark? Please see.
£222 GBP em 12 dias
5,0 (2 avaliações)
2,9
2,9
Avatar do Usuário
Hi i am a spark developer , i am good at spark(python/scala), spark sql.. I can use sbt tool to build the package. Please discuss further in chat window.
£135 GBP em 2 dias
5,0 (3 avaliações)
1,9
1,9
Avatar do Usuário
Hi, Have done a similar implementation with Spark SQL in Scala. handled both db and files reads. please reach me Thanks, Mohanakrishnan
£222 GBP em 1 dia
0,0 (0 avaliações)
0,0
0,0
Avatar do Usuário
Hi , I have closely looked your requirements. I believe to deliver that on or before time. As I am handling these type of usecases in my day to day job as data engineer. I have 5 + years of experience in these stack and domain . Plese feel free to connet with me and we can further discuss and negotiate it. Regards Jay
£222 GBP em 5 dias
0,0 (0 avaliações)
0,0
0,0
Avatar do Usuário
I can very well help you with this project. Here's a small intro about me - I am a professional Data Engineer who has very good real time experience in working with most of the latest big data technologies. Very well experienced in working with Structured, Semi-Structured and Raw data. Fell free to view my profile to know more about the technologies and skills I carry. Thank you!
£150 GBP em 3 dias
0,0 (0 avaliações)
0,0
0,0
Avatar do Usuário
I am a Data Engineer who is building and maintaining big data pipelines. I have experiences on Airline R/D and E-commerce domains. I am a Scala enthusiast and love to work with Akka. I also used Apache Spark for batch needs. I lately worked on a project to build a pipeline to extract clickstream events from different channels( Web/Android/IOS) into GCP BigQuery. I am also an English speaker who is close to be native.
£167 GBP em 3 dias
0,0 (0 avaliações)
0,0
0,0
Avatar do Usuário
Hi, I am a Data Engineer experimented in Spark Scala. I have already worked on projects with Spark. Contact me and I will give more details about theses projects.
£133 GBP em 1 dia
0,0 (0 avaliações)
0,0
0,0
Avatar do Usuário
Hi, I have total 5.5 years of experience in application development & 3 years of experience in Spark, scala, Hive. Your requirement comes under my day-to-day activity, I can create such job very optimized way. Contact me for further discussion. Regards, Adil Nazir
£25 GBP em 3 dias
0,0 (0 avaliações)
0,0
0,0

Sobre o cliente

Bandeira do(a) UNITED KINGDOM
Hammersmith, United Kingdom
5,0
6
Método de pagamento verificado
Membro desde dez. 15, 2019

Verificação do Cliente

Obrigado! Te enviamos um link por e-mail para que você possa reivindicar seu crédito gratuito.
Algo deu errado ao enviar seu e-mail. Por favor, tente novamente.
Usuários Registrados Total de Trabalhos Publicados
Freelancer ® is a registered Trademark of Freelancer Technology Pty Limited (ACN 142 189 759)
Copyright © 2024 Freelancer Technology Pty Limited (ACN 142 189 759)
Carregando pré-visualização
Permissão concedida para Geolocalização.
Sua sessão expirou e você foi desconectado. Por favor, faça login novamente.