mini project HDFS-HIVE

Encerrado Postado 1 ano atrás Pago na entrega
Encerrado Pago na entrega

The rendering will be in the form of a report with the list of commands and screenshots of commands, results and NiFi development + export of the nfi template

Work to do:

HDFS:

In HDFS, create in HDFS command lines (hdfs dfs -??????) the following tree structure /data/common/raw/DATABASE_M1/ETUDIANT_M1

In HDFS command lines, Create a file [login to view URL] in this directory (having 3 columns firstName, lastName,email, with your data)

Display HDFS command line contents of directory

Display the HDFS command line contents of the file

HIVE:

Create a database DATABASE_M1

With HQL, create a database DATABASE_M2

With HQL, create a hive table ETUDIANT_M1 in the DATABASE_M1 database pointing to the data/common/raw/DATABASE_M1/ETUDIANT_M1 directory

With HQL, Display the contents of the STUDENT_M1 table

With HQL, Create an ETUDIANT_M1_PART table in the DATABASE_M1 database partitioned on the DateRecep field (in year month, day, hour, minute format: YYYYMMDDHHmm) and pointing to the /common/raw/DATABASE_M1/ETUDIANT_M1_PART directory

Create an external table STUDENT_M2 in the DATABASE_M2 database

NIFI :

Expose a NIFI API to receive external file data (use the 2 HandleHttpRequest and HandleHttpResponse)

Send, 10 times, the data [login to view URL] (attached to course) to nifi api.

Convert data received with CSV format to avro format

Drop the data in the directory (use the processesor putHdfs) HDFS /common/raw/DATABASE_M1/ETUDIANT_M1_PART/DateRecep=202210ddHHmm (this value must be generated dynamically by nifi, (use an attribute of the flowfile with a date value in the requested format ex: Variable_DateRecep with value DateRecep=${now():format('yyyyMMddHHmm')}

Do a select on the table, what do you notice?

Run the following sql command Msck repair table DATABASE_M1.ETUDIANT_M1_PART;

Copy the data (via an hql query executed by NIFI) from the ETUDIANT_M1_PART table to the ETUDIANT_M2 table so as to keep only the latest version of the file sent (used the OVERWRITE keyword and in the where clause of the select use the value of the last score.

Hadoop Big Data Sales Spark Apache Kafka Hive

ID do Projeto: #36234284

Sobre o projeto

7 propostas Projeto remoto Ativo em há 11 meses

7 freelancers estão ofertando em média €171 nesse trabalho

MounirHoul

Greetings I'm a data engineer with extensive experience in hadoop hdfs , Hive, and big data solutions. I'm confident that I can deliver high-quality work within your budget and timeframe. Let's discuss further. Mounir

€200 EUR in 4 dias
(1 Comentário)
1.4
umairkaramat24

Hi, how are you? I go through the description and read it carefully, I know exactly what you are looking for. I have 5+ years’ experience in these skills Big Data Sales, Apache Kafka, Hadoop, Spark and Hive. I have so Mais

€250 EUR in 5 dias
(0 Comentários)
0.0
Ibayoussef231

Hi, I have already worked a project very similar to yours and I believe I can make this work in 7 days maximum due to my knowledge of the big data ecosystem. We can talk in details if this interests you.

€120 EUR in 7 dias
(0 Comentários)
0.0
kaish1

I am a 6+ years experienced data engineer. I can do the development for you in 1 week with professionalism.

€100 EUR in 7 dias
(0 Comentários)
0.0
anydataflow

Hi, I can do this effectively as i have expertise in Hadoop, hive , nifi... Plz visit my profile for more info. Thanks

€140 EUR in 7 dias
(1 Comentário)
0.1
Iamerum

Hi there , I have been working in big data Hadoop projects &. I excel at Hadoop , Hive . Let me know if I can help you on this . Thanks

€140 EUR in 9 dias
(0 Comentários)
0.0