Scala Spark Project to Group Values stored in MySQL and CSV.

£20-250 GBP

Fechado

Publicado

há mais de 3 anos

£20-250 GBP

Pago na entrega

All, I need two spark jobs written in Scala to do some groupings on data stored in: Note: Both have the same data stored * Note: For the MySQL, please use SPARKSQL. Note: Please use SBT as build tool for the project (I am happy to provide an example spark-project built with scala, sbt so you can use that as starting point without worrying with all the dependencies and issues) Job #1) Reads from MySQL database and saves output to MySQL database Job #2) Reads from a given directory containing one or more CSV file(s) and saves output to a CSV file in a new directory location. Input of MySQL: [login to view URL]!9/137127/1 Input of CSV: (Attached as: [login to view URL]) I expect that once I run the spark job, it will give me the following output: Output of MySQL: [login to view URL]!9/802f692/1 Output of CSV: (Attached as: [login to view URL]) The groups need to do the following: LOGIC: A: IF field column "message_originator" and "account_id" are same and IF the column value of "start_ts" is in the SAME DAY (unix_epoch_timestamp).... then output 1 row adding the column values of "msg_parts_vol" and "whole_msg_vol" and "freq". Think of it as a summary row of one-or-many-rows. B: In the output result, please use the beginning of date (unix_epoch_timestamp) in the "start_ts" column. C: The spark job should take in two args: (1: start_ts) and (2: end_ts) and then when it queries the MySQL or CSV, it should only process group rows for which "start_ts" falls within that unix_epoch_timestamp. D: The spark job should be able to take in a fixed argument: (eg: LAST_3_DAYS) and it should be able to convert that automatically to unix_epoch_timestamp of current time minus 3 days to work out "start_ts" and "end_ts". To explain this further for example: if at the time of spark job running command current time is: 2020-11-09 13:15:00 UTC and we use "LAST_3_DAYS" then "start_ts" should be (1604620800) which is 2020-11-06 00:00:00 UTC and "end_ts" should be (1604880000) which is 2020-11-09 00:00:00 UTC E: The spark job once it is completed, should store the final rows "end_ts" value in a new table. e.g: [login to view URL]!9/e15c54/1 or if it is CSV then like (Attached as: [login to view URL])

Spark

Scala

PySpark

ID do Projeto: 28092554

Sobre o projeto

10 propostas

Projeto remoto

Ativo há 3 anos

Quer ganhar algum dinheiro?

Endereço de e-mail

Benefícios de ofertar no Freelancer

Defina seu orçamento e seu prazo

Seja pago pelo seu trabalho

Descreva sua proposta

É grátis para se inscrever e fazer ofertas em trabalhos

10 freelancers estão ofertando em média £168 GBP for esse trabalho

@nmogilip

Hi, Thank you for invite. I am a certified developer and designed and developed many enterprise level applications using spark. Please ping me in chat we Will discuss. Thank you, Naresh.

£150 GBP em 1 dia

5,0

(9 avaliações)

4,4

@mishraS83

Have done similar work mate, will be happy to help you here again. do we need to use pyspark? Please see.

£222 GBP em 12 dias

5,0

(2 avaliações)

2,9

@sirishkarthik

Hi i am a spark developer , i am good at spark(python/scala), spark sql.. I can use sbt tool to build the package. Please discuss further in chat window.

£135 GBP em 2 dias

5,0

(3 avaliações)

1,9

@mohanakrishnanba

Hi, Have done a similar implementation with Spark SQL in Scala. handled both db and files reads. please reach me Thanks, Mohanakrishnan

£222 GBP em 1 dia

0,0

(0 avaliações)

0,0

@jaysinghtanwar1

Hi , I have closely looked your requirements. I believe to deliver that on or before time. As I am handling these type of usecases in my day to day job as data engineer. I have 5 + years of experience in these stack and domain . Plese feel free to connet with me and we can further discuss and negotiate it. Regards Jay

£222 GBP em 5 dias

0,0

(0 avaliações)

0,0

@Harsha3966

I can very well help you with this project. Here's a small intro about me - I am a professional Data Engineer who has very good real time experience in working with most of the latest big data technologies. Very well experienced in working with Structured, Semi-Structured and Raw data. Fell free to view my profile to know more about the technologies and skills I carry. Thank you!

£150 GBP em 3 dias

0,0

(0 avaliações)

0,0

@guzeloglusoner

I am a Data Engineer who is building and maintaining big data pipelines. I have experiences on Airline R/D and E-commerce domains. I am a Scala enthusiast and love to work with Akka. I also used Apache Spark for batch needs. I lately worked on a project to build a pipeline to extract clickstream events from different channels( Web/Android/IOS) into GCP BigQuery. I am also an English speaker who is close to be native.

£167 GBP em 3 dias

0,0

(0 avaliações)

0,0

@ahmadndiayee

Hi, I am a Data Engineer experimented in Spark Scala. I have already worked on projects with Spark. Contact me and I will give more details about theses projects.

£133 GBP em 1 dia

0,0

(0 avaliações)

0,0

@adildiatm

Hi, I have total 5.5 years of experience in application development & 3 years of experience in Spark, scala, Hive. Your requirement comes under my day-to-day activity, I can create such job very optimized way. Contact me for further discussion. Regards, Adil Nazir

£25 GBP em 3 dias