..

Instructions

Complex, UDF, Pig Storage,

PPT Structure

  • Topic
  • Why that dataset
  • Why this specific features set
  • What sort of analysis
    • Eg: Customer Retention, Environment depletion
  • Each student 2/3 slides
  • Minimum 10 queries
  • Show at least 3 queries
  • The first three things are common for the group
  • The analysis part should be unique

Mongo

Arrays, aggregates, cursors, MapReduce, Functions, Innovative Queries

Online Course

  • 2 for starting + 8 for completion

Official Rubrics

  • Map reduce Programming – 20 Marks (Completed)
  • Hive and Pig – 50 marks 
    • Hive (Dataset selection (2) + Basic Queries for dataset- 8 Marks.

      • Concepts of Hive – Partitions, Buckets, Complex datatypes, Serde, UDF, innovative queries – least 2 concepts – (10 Marks – 5 marks each)
    • Pig (Basic Queries for dataset- 10 Marks)

      • Concepts of Pig – Complex datatypes, UDF, pig storage, piggy bank, new functions, pig scripts, and innovative queries – )10 marks )
    • Each Student is expected to run at least 10 queries and show their execution.

    • A PPT with the following concepts needs to be submitted by the group. (10 Marks)

    • Topic

    • Dataset Selection and significance of the dataset

    • Features of the dataset

    • What type of analysis is to be carried out on the dataset?

    • Types of questions that will be answered using queries.

    • Each student must have a minimum of 2/3 slides showing their queries and execution.

  • Mongo ( 30 Marks)
    • Basic queries (10 marks)
    • Concepts of Arrays, Aggregates, cursors, MapReduce, functions, , innovative queries ( 20 Marks) 
    • What type of analysis is to be carried out on the dataset?
    • Types of questions that will be answered using queries.
    • Each student must have a minimum of 2/3 slides showing their queries and execution.
  • Online Course – 10 marks ( 2 Marks – Start of Course + 8 Marks for completion of the course)