..
Instructions
Complex, UDF, Pig Storage,
PPT Structure
- Topic
- Why that dataset
- Why this specific features set
- What sort of analysis
- Eg: Customer Retention, Environment depletion
- Each student 2/3 slides
- Minimum 10 queries
- Show at least 3 queries
- The first three things are common for the group
- The analysis part should be unique
Mongo
Arrays, aggregates, cursors, MapReduce, Functions, Innovative Queries
Online Course
- 2 for starting + 8 for completion
Official Rubrics
- Map reduce Programming – 20 Marks (Completed)
- Hive and Pig – 50 marks
-
Hive (Dataset selection (2) + Basic Queries for dataset- 8 Marks.
- Concepts of Hive – Partitions, Buckets, Complex datatypes, Serde, UDF, innovative queries – least 2 concepts – (10 Marks – 5 marks each)
-
Pig (Basic Queries for dataset- 10 Marks)
- Concepts of Pig – Complex datatypes, UDF, pig storage, piggy bank, new functions, pig scripts, and innovative queries – )10 marks )
-
Each Student is expected to run at least 10 queries and show their execution.
-
A PPT with the following concepts needs to be submitted by the group. (10 Marks)
-
Topic
-
Dataset Selection and significance of the dataset
-
Features of the dataset
-
What type of analysis is to be carried out on the dataset?
-
Types of questions that will be answered using queries.
-
Each student must have a minimum of 2/3 slides showing their queries and execution.
-
- Mongo ( 30 Marks)
- Basic queries (10 marks)
- Concepts of Arrays, Aggregates, cursors, MapReduce, functions, , innovative queries ( 20 Marks)
- What type of analysis is to be carried out on the dataset?
- Types of questions that will be answered using queries.
- Each student must have a minimum of 2/3 slides showing their queries and execution.
- Online Course – 10 marks ( 2 Marks – Start of Course + 8 Marks for completion of the course)