SlideShare a Scribd company logo
Curriculum Development Centre, DOTE. Page 174
4052633 – Elective Theory II
Data Science and Big Data
DIPLOMA IN COMPUTER ENGINEERING
SEMESTER PATTERN
III YEAR
N – SCHEME
VI SEMESTER
Curriculum Development Centre, DOTE. Page 175
STATE BOARD OF TECHNICAL EDUCATION &TRAINING, TAMILNADU
DIPLOMA IN ENGINEERING / TECHNOLOGY SYLLABUS
N-SCHEME
(To be Implemented for the students admitted from the year 2021 - 2022 onwards)
Course Name : 1052:Diploma in Computer Engineering
Subject Code : 4052633
Semester : VI
Subject title : Elective Theory - II Data Science and Big Data
TEACHING AND SCHEME OF EXAMINATION
No. of weeks per Semester 16 Weeks
Subject Instructions Examination
Data Science
and Big Data
Hours/
Week
Hours/
Semester
Marks
5 80
Internal
Assessment
Board
Examination
Total Duration
25 100 * 100 3 Hrs
* Examinations will be conducted for 100 marks and it will be reduced to 75 marks.
Topics and Allocation of Hours
Unit No. Topic No. of Hours
I Introduction to Data Science 15
II Fundamentals of Data Modelling 15
III Fundamentals of Big Data 15
IV Big Data Storage 14
V Big Data Processing 14
Test and Revision 7
Total 80
Curriculum Development Centre, DOTE. Page 176
RATIONALE:
This course provides a comprehensive understanding of data science and data
modeling. The foundation on data science is laid to understand the core concepts and
the techniques that underlie today's big data computing technologies. This course helps
the students in identifying and applying appropriate techniques and tools to solve
problems in managing huge quantities of data.
OBJECTIVES:
This subject has two major divisions. The objectives of these topics are given
below.
Data Science
After studying the first two units of this syllabus, students will be able
● To understand the fundamentals of data science, various data types,
theirsources, problems and issues, various formats of data .
● To apply the Python libraries and Microsoft Excel for Data analysis.
● To work with Microsoft Excel for data analysis and applying various
● functions for data analysis.
● To familiarise with the basic data representation methods.
● To understand the concepts of samples, attributes and their relationships.
● To develop and implement simple linear regression models.
● To understand the concept of model equation and of fit.
● To understand and differentiate the concepts of predictive models and the
classification models.
● To familiarize with the concepts of Neural Networks, Decision Trees and
Nearest neighbors techniques.
Big Data
After studying the lessons from Units III to V, the students will be able to
● Get conceptual understanding of Big Data, Web data, classification of data,
Big Data characteristics, types, classification and handling techniques.
● Get the conceptual understanding of the impact of ICT developments on Big
Data Adoption.
Curriculum Development Centre, DOTE. Page 177
● Understand the Big Data Analytics Life Cycle.
● Get the conceptual understandings of Big Data Storage systems and
technologies.
● Understand the concepts of NoSQL databases, their types and
characteristics.
● Understand the concepts of Hadoop and its Ecosystem.
● Understand the steps involved in Big data processing like parallel
processing, distributed processing and Batch processing.
● Get understanding of MapReduce, map and reduce tasks, MapReduce
algorithm.
● Understand the various techniques for Big Data analysis.
● Get introduced to the concepts and types of machine learning techniques.
● Explore the applications of Big Data in different fields.
Detailed Syllabus
Contents : Theory
Unit Name of the Topics Hours
I Introduction to Data Science 15
1.1.Data Science - Subfields of Data Science- Data Types-Data 6
Science Road Map- Programming languages for Data Science-
Problems with Data- Formatting issues- Python features- Python
Technical libraries- Python Arrays and Data Frames.
1.2.Data sources- Data Quality- Consistency and accuracy 4
(Integrity), Noise: Outliers, Missing and Duplicate values- Data
Preprocessing using Cleaning, Enrichment, Editing, Reduction,
Wrangling- Data Formats: TXT, CSV, XML, JSON, TLV- Loading
and Saving files
1.3 Working with Excel: Loading data- Statistical functions- Text 5
Functions- Lookup Functions- Sorting- Filtering- Data Analysis:
Correlation, covariance, Descriptive statistics, Regression.
II Fundamentals of Data Modelling 15

Recommended for you

Pemanfaatan Big Data Dalam Riset 2023.pptx
Pemanfaatan Big Data Dalam Riset 2023.pptxPemanfaatan Big Data Dalam Riset 2023.pptx
Pemanfaatan Big Data Dalam Riset 2023.pptx

1. The document discusses the use of big data in various fields such as education, bioinformatics, and genomics. It provides examples of studies that have used big data analytics for student performance monitoring, genomic repeats detection, and understanding trends in educational research. 2. Methodologies for big data analysis discussed include using Apache Spark for efficient processing of large genomic datasets and building predictive models from multiple educational variables. 3. Key applications highlighted are automatic grading of MOOC assignments using machine learning, analyzing program learning outcomes in outcome-based education systems.

Data science.pptx
Data science.pptxData science.pptx
Data science.pptx

The document describes the EDISON Data Science Framework (EDSF) which aims to establish the foundation for the data science profession. The framework includes several components: a data science competence framework, body of knowledge, model curriculum, data science professions family profiles, and an online education environment. It identifies five competence groups for data science: data analytics, data science engineering, domain expertise, data management, and scientific/business methods. The framework also defines a data science body of knowledge with knowledge areas covering these competence groups, and outlines a data science professions family with different associated roles.

by k.hakkins raj
Course outline
Course outlineCourse outline
Course outline

This course provides an introduction to data science, its applications, and the tools used. Students will learn Python, NumPy, Pandas, and scikit-learn for data analysis and machine learning. The course aims to help students understand how data science can improve emergency response, environmental impact analysis, and personalized customer services. Students will learn to find datasets, analyze data to answer research questions, and present findings. Assessment includes midterm, final exams, quizzes, assignments, and a project.

Curriculum Development Centre, DOTE. Page 178
2.1.Linear Algebra: Data representation - Data as a Matrix -
Samples and Attributes- Classification of attributes- Concept of
Rank-Identify the relationship among attributes
5
2.2.Predictive models: Regression Models - Linear regression -
Simple and Multiple Regression-Correlation-Mean squared Error-
Testing goodness of fit-Model Equation
5
2.3.Classification models: Two class- Multi class classification-
Separability- Performance measures- Terminology- Confusion
Matrix-Types (Concepts only): Neural Network- Decision Trees-
Nearest Neighbors.
5
III Fundamentals of Big Data 15
3.1Data - Web Data- Classification of Data- Big Data-
Characteristics- Volume, Velocity, Variety, Veracity, Value- Need for
Big Data- Big Data Types and classifications- Sources of Big Data-
Big Data handling techniques-Challenges.
6
3.2 Impact of ICT developments on Big data Adoption: data
analytics and data science, digitization, affordable technology and
commodity hardware, social media, hyper connected communities
and devices, cloud computing and IoT.
4
3.3.Big Data Analytics Life Cycle: Business Case Evaluation, Data
Identification, Data Acquisition & Filtering, Data Extraction, Data
Validation & Cleansing, Data Aggregation & Representation, Data
Analysis, Data Visualization, Utilization of Analysis Results.
5
IV Big Data Storage 14
4.1.Storage Concepts: Clusters, File Systems, Distributed File
System, NoSQL, Sharding, Replication, Master Slave, Peer to Peer,
CAP Theorem
4
Curriculum Development Centre, DOTE. Page 179
4.2. Big Data Storage Technologies: On-Disk Storage Devices-
Distributed File system-RDBMS- NoSQL Databases- Characteristics
of NoSQL- Types of NoSQL Storage devices. In-Memory storage
devices-Data Grids-Databases
5
4.3.Hadoop: Introduction- Hadoop and its Ecosystem: Hadoop core
components - Features of Hadoop- Hadoop Ecosystem
components- Hadoop streaming- Hadoop pipes- Hadoop distributed
File system- HDFS data storage -Hadoop Ecosystem tools.
5
V Big Data Processing 14
5.1.Parallel data processing- Distributed data processing- Hadoop
Framework- Processing workloads- cluster for processing- Batch
processing with MapReduce- Map and Reduce Tasks- MapReduce
algorithms- Processing in Realtime mode- Real time processing and
MapReduce.
5
5.2.Big Data Analysis Techniques: Quantitative analysis,
Qualitative analysis, Data mining, Statistical analysis: Correlation,
regression, Machine Learning: Classification, clustering, outlier
detection, filtering. Semantic analysis: Natural language processing,
Text Analytics, Sentiment analysis, Visual Analysis
5
5.3.Big Data Analytics Applications and case studies: Big data in
Marketing and sales- Big data and Healthcare- Big data in Medicine-
Big Data in Advertising.
4
Reference books
1. Field Cady, “The Data Science Handbook”, Wiley, 2017.
2. Jake VanderPlas, “Python Data Science Handbook- Essential tools for working
with data”, O’REILLY, 2017
3. Davy Cielen, Arno D. B. Meysman, Mohamed Ali, “Introducing Data Science”,
manning publications, 2016
4. Thomas Erl, Wajid Khattak - Big Data Fundamentals Concepts, Drivers &
Techniques-Prentice Hall (2016).
5. Raj kamal, Preeti Saxena, “Big Data Analytics-Introduction to Hadoop, Spark and
Machine Learning”, McGraw Hill Education(India) Pvt Ltd., 2019.
Curriculum Development Centre, DOTE. Page 180
6. Michael Minelli, Michelle Chambers, and Ambiga Dhiraj, "Big Data, Big Analytics:
Emerging Business Intelligence and Analytic Trends for Today's Businesses",
Wiley, 2013.
7. Tom White, "Hadoop: The Definitive Guide", Third Edition, O'Reilley, 2012.
8. NPTEL MOOC courses on “Data Science” and “Big Data”.

More Related Content

Similar to Data Science & Big Data - Theory.pdf

Self Study Business Approach to DS_01022022.docx
Self Study Business Approach to DS_01022022.docxSelf Study Business Approach to DS_01022022.docx
Self Study Business Approach to DS_01022022.docx
Shanmugasundaram M
 
Data Science at Scale - The DevOps Approach
Data Science at Scale - The DevOps ApproachData Science at Scale - The DevOps Approach
Data Science at Scale - The DevOps Approach
Mihai Criveti
 
R18B.Tech.CSE(DataScience)IIIIVYearTentativeSyllabus.pdf
R18B.Tech.CSE(DataScience)IIIIVYearTentativeSyllabus.pdfR18B.Tech.CSE(DataScience)IIIIVYearTentativeSyllabus.pdf
R18B.Tech.CSE(DataScience)IIIIVYearTentativeSyllabus.pdf
Naveen Kumar
 
Pemanfaatan Big Data Dalam Riset 2023.pptx
Pemanfaatan Big Data Dalam Riset 2023.pptxPemanfaatan Big Data Dalam Riset 2023.pptx
Pemanfaatan Big Data Dalam Riset 2023.pptx
elisarosa29
 
Data science.pptx
Data science.pptxData science.pptx
Data science.pptx
HakkinsRaj
 
Course outline
Course outlineCourse outline
Course outline
SumbalImran2
 
Lecture_1_Intro.pdf
Lecture_1_Intro.pdfLecture_1_Intro.pdf
Lecture_1_Intro.pdf
paijitk
 
M.tech cse 10july13 (1)
M.tech cse  10july13 (1)M.tech cse  10july13 (1)
M.tech cse 10july13 (1)
vijay707070
 
Data science presentation
Data science presentationData science presentation
Data science presentation
MSDEVMTL
 
2016 Chapter 2 - Intro. to Data Sciences.pptx
2016  Chapter 2 - Intro. to Data Sciences.pptx2016  Chapter 2 - Intro. to Data Sciences.pptx
2016 Chapter 2 - Intro. to Data Sciences.pptx
mussie143tadesse
 
ESSnet Big Data WP8 Methodology (+ Quality, +IT)
ESSnet Big Data WP8 Methodology (+ Quality, +IT)ESSnet Big Data WP8 Methodology (+ Quality, +IT)
ESSnet Big Data WP8 Methodology (+ Quality, +IT)
Piet J.H. Daas
 
Database Systems - Lecture Week 1
Database Systems - Lecture Week 1Database Systems - Lecture Week 1
Database Systems - Lecture Week 1
Dios Kurniawan
 
Data Science Training and Placement
Data Science Training and PlacementData Science Training and Placement
Data Science Training and Placement
AkhilGGM
 
Apache Spark sql
Apache Spark sqlApache Spark sql
Apache Spark sql
aftab alam
 
Best Selenium certification course
Best Selenium certification courseBest Selenium certification course
Best Selenium certification course
KumarNaik21
 
The Simple 5-Step Process for Creating a Winning Data Pipeline.pdf
The Simple 5-Step Process for Creating a Winning Data Pipeline.pdfThe Simple 5-Step Process for Creating a Winning Data Pipeline.pdf
The Simple 5-Step Process for Creating a Winning Data Pipeline.pdf
Data Science Council of America
 
Building the Data Science Profession in Europe
Building the Data Science Profession in EuropeBuilding the Data Science Profession in Europe
Building the Data Science Profession in Europe
Steven Miller
 
Data science training in hyd ppt converted (1)
Data science training in hyd ppt converted (1)Data science training in hyd ppt converted (1)
Data science training in hyd ppt converted (1)
SayyedYusufali
 
Data science training in hyd pdf converted (1)
Data science training in hyd pdf converted (1)Data science training in hyd pdf converted (1)
Data science training in hyd pdf converted (1)
SayyedYusufali
 
Data science training in hydpdf converted (1)
Data science training in hydpdf  converted (1)Data science training in hydpdf  converted (1)
Data science training in hydpdf converted (1)
SayyedYusufali
 

Similar to Data Science & Big Data - Theory.pdf (20)

Self Study Business Approach to DS_01022022.docx
Self Study Business Approach to DS_01022022.docxSelf Study Business Approach to DS_01022022.docx
Self Study Business Approach to DS_01022022.docx
 
Data Science at Scale - The DevOps Approach
Data Science at Scale - The DevOps ApproachData Science at Scale - The DevOps Approach
Data Science at Scale - The DevOps Approach
 
R18B.Tech.CSE(DataScience)IIIIVYearTentativeSyllabus.pdf
R18B.Tech.CSE(DataScience)IIIIVYearTentativeSyllabus.pdfR18B.Tech.CSE(DataScience)IIIIVYearTentativeSyllabus.pdf
R18B.Tech.CSE(DataScience)IIIIVYearTentativeSyllabus.pdf
 
Pemanfaatan Big Data Dalam Riset 2023.pptx
Pemanfaatan Big Data Dalam Riset 2023.pptxPemanfaatan Big Data Dalam Riset 2023.pptx
Pemanfaatan Big Data Dalam Riset 2023.pptx
 
Data science.pptx
Data science.pptxData science.pptx
Data science.pptx
 
Course outline
Course outlineCourse outline
Course outline
 
Lecture_1_Intro.pdf
Lecture_1_Intro.pdfLecture_1_Intro.pdf
Lecture_1_Intro.pdf
 
M.tech cse 10july13 (1)
M.tech cse  10july13 (1)M.tech cse  10july13 (1)
M.tech cse 10july13 (1)
 
Data science presentation
Data science presentationData science presentation
Data science presentation
 
2016 Chapter 2 - Intro. to Data Sciences.pptx
2016  Chapter 2 - Intro. to Data Sciences.pptx2016  Chapter 2 - Intro. to Data Sciences.pptx
2016 Chapter 2 - Intro. to Data Sciences.pptx
 
ESSnet Big Data WP8 Methodology (+ Quality, +IT)
ESSnet Big Data WP8 Methodology (+ Quality, +IT)ESSnet Big Data WP8 Methodology (+ Quality, +IT)
ESSnet Big Data WP8 Methodology (+ Quality, +IT)
 
Database Systems - Lecture Week 1
Database Systems - Lecture Week 1Database Systems - Lecture Week 1
Database Systems - Lecture Week 1
 
Data Science Training and Placement
Data Science Training and PlacementData Science Training and Placement
Data Science Training and Placement
 
Apache Spark sql
Apache Spark sqlApache Spark sql
Apache Spark sql
 
Best Selenium certification course
Best Selenium certification courseBest Selenium certification course
Best Selenium certification course
 
The Simple 5-Step Process for Creating a Winning Data Pipeline.pdf
The Simple 5-Step Process for Creating a Winning Data Pipeline.pdfThe Simple 5-Step Process for Creating a Winning Data Pipeline.pdf
The Simple 5-Step Process for Creating a Winning Data Pipeline.pdf
 
Building the Data Science Profession in Europe
Building the Data Science Profession in EuropeBuilding the Data Science Profession in Europe
Building the Data Science Profession in Europe
 
Data science training in hyd ppt converted (1)
Data science training in hyd ppt converted (1)Data science training in hyd ppt converted (1)
Data science training in hyd ppt converted (1)
 
Data science training in hyd pdf converted (1)
Data science training in hyd pdf converted (1)Data science training in hyd pdf converted (1)
Data science training in hyd pdf converted (1)
 
Data science training in hydpdf converted (1)
Data science training in hydpdf  converted (1)Data science training in hydpdf  converted (1)
Data science training in hydpdf converted (1)
 

Recently uploaded

ENGLISH-7-CURRICULUM MAP- MATATAG CURRICULUM
ENGLISH-7-CURRICULUM MAP- MATATAG CURRICULUMENGLISH-7-CURRICULUM MAP- MATATAG CURRICULUM
ENGLISH-7-CURRICULUM MAP- MATATAG CURRICULUM
HappieMontevirgenCas
 
Views in Odoo - Advanced Views - Pivot View in Odoo 17
Views in Odoo - Advanced Views - Pivot View in Odoo 17Views in Odoo - Advanced Views - Pivot View in Odoo 17
Views in Odoo - Advanced Views - Pivot View in Odoo 17
Celine George
 
Final_SD_Session3_Ferriols, Ador Dionisio, Fajardo.pptx
Final_SD_Session3_Ferriols, Ador Dionisio, Fajardo.pptxFinal_SD_Session3_Ferriols, Ador Dionisio, Fajardo.pptx
Final_SD_Session3_Ferriols, Ador Dionisio, Fajardo.pptx
shimeathdelrosario1
 
How to Show Sample Data in Tree and Kanban View in Odoo 17
How to Show Sample Data in Tree and Kanban View in Odoo 17How to Show Sample Data in Tree and Kanban View in Odoo 17
How to Show Sample Data in Tree and Kanban View in Odoo 17
Celine George
 
The basics of sentences session 10pptx.pptx
The basics of sentences session 10pptx.pptxThe basics of sentences session 10pptx.pptx
The basics of sentences session 10pptx.pptx
heathfieldcps1
 
Lecture_Notes_Unit4_Chapter_8_9_10_RDBMS for the students affiliated by alaga...
Lecture_Notes_Unit4_Chapter_8_9_10_RDBMS for the students affiliated by alaga...Lecture_Notes_Unit4_Chapter_8_9_10_RDBMS for the students affiliated by alaga...
Lecture_Notes_Unit4_Chapter_8_9_10_RDBMS for the students affiliated by alaga...
Murugan Solaiyappan
 
How to Add Colour Kanban Records in Odoo 17 Notebook
How to Add Colour Kanban Records in Odoo 17 NotebookHow to Add Colour Kanban Records in Odoo 17 Notebook
How to Add Colour Kanban Records in Odoo 17 Notebook
Celine George
 
L1 L2- NLC PPT for Grade 10 intervention
L1 L2- NLC PPT for Grade 10 interventionL1 L2- NLC PPT for Grade 10 intervention
L1 L2- NLC PPT for Grade 10 intervention
RHODAJANEAURESTILA
 
The membership Module in the Odoo 17 ERP
The membership Module in the Odoo 17 ERPThe membership Module in the Odoo 17 ERP
The membership Module in the Odoo 17 ERP
Celine George
 
DANH SÁCH THÍ SINH XÉT TUYỂN SỚM ĐỦ ĐIỀU KIỆN TRÚNG TUYỂN ĐẠI HỌC CHÍNH QUY N...
DANH SÁCH THÍ SINH XÉT TUYỂN SỚM ĐỦ ĐIỀU KIỆN TRÚNG TUYỂN ĐẠI HỌC CHÍNH QUY N...DANH SÁCH THÍ SINH XÉT TUYỂN SỚM ĐỦ ĐIỀU KIỆN TRÚNG TUYỂN ĐẠI HỌC CHÍNH QUY N...
DANH SÁCH THÍ SINH XÉT TUYỂN SỚM ĐỦ ĐIỀU KIỆN TRÚNG TUYỂN ĐẠI HỌC CHÍNH QUY N...
thanhluan21
 
How to Configure Time Off Types in Odoo 17
How to Configure Time Off Types in Odoo 17How to Configure Time Off Types in Odoo 17
How to Configure Time Off Types in Odoo 17
Celine George
 
eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee
eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee
eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee
siemaillard
 
The Jewish Trinity : Sabbath,Shekinah and Sanctuary 4.pdf
The Jewish Trinity : Sabbath,Shekinah and Sanctuary 4.pdfThe Jewish Trinity : Sabbath,Shekinah and Sanctuary 4.pdf
The Jewish Trinity : Sabbath,Shekinah and Sanctuary 4.pdf
JackieSparrow3
 
2024 KWL Back 2 School Summer Conference
2024 KWL Back 2 School Summer Conference2024 KWL Back 2 School Summer Conference
2024 KWL Back 2 School Summer Conference
KlettWorldLanguages
 
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
marianell3076
 
The basics of sentences session 9pptx.pptx
The basics of sentences session 9pptx.pptxThe basics of sentences session 9pptx.pptx
The basics of sentences session 9pptx.pptx
heathfieldcps1
 
No, it's not a robot: prompt writing for investigative journalism
No, it's not a robot: prompt writing for investigative journalismNo, it's not a robot: prompt writing for investigative journalism
No, it's not a robot: prompt writing for investigative journalism
Paul Bradshaw
 
How to Handle the Separate Discount Account on Invoice in Odoo 17
How to Handle the Separate Discount Account on Invoice in Odoo 17How to Handle the Separate Discount Account on Invoice in Odoo 17
How to Handle the Separate Discount Account on Invoice in Odoo 17
Celine George
 
National Learning Camp( Reading Intervention for grade1)
National Learning Camp( Reading Intervention for grade1)National Learning Camp( Reading Intervention for grade1)
National Learning Camp( Reading Intervention for grade1)
SaadaGrijaldo1
 
"DANH SÁCH THÍ SINH XÉT TUYỂN SỚM ĐỦ ĐIỀU KIỆN TRÚNG TUYỂN ĐẠI HỌC CHÍNH QUY ...
"DANH SÁCH THÍ SINH XÉT TUYỂN SỚM ĐỦ ĐIỀU KIỆN TRÚNG TUYỂN ĐẠI HỌC CHÍNH QUY ..."DANH SÁCH THÍ SINH XÉT TUYỂN SỚM ĐỦ ĐIỀU KIỆN TRÚNG TUYỂN ĐẠI HỌC CHÍNH QUY ...
"DANH SÁCH THÍ SINH XÉT TUYỂN SỚM ĐỦ ĐIỀU KIỆN TRÚNG TUYỂN ĐẠI HỌC CHÍNH QUY ...
thanhluan21
 

Recently uploaded (20)

ENGLISH-7-CURRICULUM MAP- MATATAG CURRICULUM
ENGLISH-7-CURRICULUM MAP- MATATAG CURRICULUMENGLISH-7-CURRICULUM MAP- MATATAG CURRICULUM
ENGLISH-7-CURRICULUM MAP- MATATAG CURRICULUM
 
Views in Odoo - Advanced Views - Pivot View in Odoo 17
Views in Odoo - Advanced Views - Pivot View in Odoo 17Views in Odoo - Advanced Views - Pivot View in Odoo 17
Views in Odoo - Advanced Views - Pivot View in Odoo 17
 
Final_SD_Session3_Ferriols, Ador Dionisio, Fajardo.pptx
Final_SD_Session3_Ferriols, Ador Dionisio, Fajardo.pptxFinal_SD_Session3_Ferriols, Ador Dionisio, Fajardo.pptx
Final_SD_Session3_Ferriols, Ador Dionisio, Fajardo.pptx
 
How to Show Sample Data in Tree and Kanban View in Odoo 17
How to Show Sample Data in Tree and Kanban View in Odoo 17How to Show Sample Data in Tree and Kanban View in Odoo 17
How to Show Sample Data in Tree and Kanban View in Odoo 17
 
The basics of sentences session 10pptx.pptx
The basics of sentences session 10pptx.pptxThe basics of sentences session 10pptx.pptx
The basics of sentences session 10pptx.pptx
 
Lecture_Notes_Unit4_Chapter_8_9_10_RDBMS for the students affiliated by alaga...
Lecture_Notes_Unit4_Chapter_8_9_10_RDBMS for the students affiliated by alaga...Lecture_Notes_Unit4_Chapter_8_9_10_RDBMS for the students affiliated by alaga...
Lecture_Notes_Unit4_Chapter_8_9_10_RDBMS for the students affiliated by alaga...
 
How to Add Colour Kanban Records in Odoo 17 Notebook
How to Add Colour Kanban Records in Odoo 17 NotebookHow to Add Colour Kanban Records in Odoo 17 Notebook
How to Add Colour Kanban Records in Odoo 17 Notebook
 
L1 L2- NLC PPT for Grade 10 intervention
L1 L2- NLC PPT for Grade 10 interventionL1 L2- NLC PPT for Grade 10 intervention
L1 L2- NLC PPT for Grade 10 intervention
 
The membership Module in the Odoo 17 ERP
The membership Module in the Odoo 17 ERPThe membership Module in the Odoo 17 ERP
The membership Module in the Odoo 17 ERP
 
DANH SÁCH THÍ SINH XÉT TUYỂN SỚM ĐỦ ĐIỀU KIỆN TRÚNG TUYỂN ĐẠI HỌC CHÍNH QUY N...
DANH SÁCH THÍ SINH XÉT TUYỂN SỚM ĐỦ ĐIỀU KIỆN TRÚNG TUYỂN ĐẠI HỌC CHÍNH QUY N...DANH SÁCH THÍ SINH XÉT TUYỂN SỚM ĐỦ ĐIỀU KIỆN TRÚNG TUYỂN ĐẠI HỌC CHÍNH QUY N...
DANH SÁCH THÍ SINH XÉT TUYỂN SỚM ĐỦ ĐIỀU KIỆN TRÚNG TUYỂN ĐẠI HỌC CHÍNH QUY N...
 
How to Configure Time Off Types in Odoo 17
How to Configure Time Off Types in Odoo 17How to Configure Time Off Types in Odoo 17
How to Configure Time Off Types in Odoo 17
 
eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee
eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee
eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee
 
The Jewish Trinity : Sabbath,Shekinah and Sanctuary 4.pdf
The Jewish Trinity : Sabbath,Shekinah and Sanctuary 4.pdfThe Jewish Trinity : Sabbath,Shekinah and Sanctuary 4.pdf
The Jewish Trinity : Sabbath,Shekinah and Sanctuary 4.pdf
 
2024 KWL Back 2 School Summer Conference
2024 KWL Back 2 School Summer Conference2024 KWL Back 2 School Summer Conference
2024 KWL Back 2 School Summer Conference
 
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
 
The basics of sentences session 9pptx.pptx
The basics of sentences session 9pptx.pptxThe basics of sentences session 9pptx.pptx
The basics of sentences session 9pptx.pptx
 
No, it's not a robot: prompt writing for investigative journalism
No, it's not a robot: prompt writing for investigative journalismNo, it's not a robot: prompt writing for investigative journalism
No, it's not a robot: prompt writing for investigative journalism
 
How to Handle the Separate Discount Account on Invoice in Odoo 17
How to Handle the Separate Discount Account on Invoice in Odoo 17How to Handle the Separate Discount Account on Invoice in Odoo 17
How to Handle the Separate Discount Account on Invoice in Odoo 17
 
National Learning Camp( Reading Intervention for grade1)
National Learning Camp( Reading Intervention for grade1)National Learning Camp( Reading Intervention for grade1)
National Learning Camp( Reading Intervention for grade1)
 
"DANH SÁCH THÍ SINH XÉT TUYỂN SỚM ĐỦ ĐIỀU KIỆN TRÚNG TUYỂN ĐẠI HỌC CHÍNH QUY ...
"DANH SÁCH THÍ SINH XÉT TUYỂN SỚM ĐỦ ĐIỀU KIỆN TRÚNG TUYỂN ĐẠI HỌC CHÍNH QUY ..."DANH SÁCH THÍ SINH XÉT TUYỂN SỚM ĐỦ ĐIỀU KIỆN TRÚNG TUYỂN ĐẠI HỌC CHÍNH QUY ...
"DANH SÁCH THÍ SINH XÉT TUYỂN SỚM ĐỦ ĐIỀU KIỆN TRÚNG TUYỂN ĐẠI HỌC CHÍNH QUY ...
 

Data Science & Big Data - Theory.pdf

  • 1. Curriculum Development Centre, DOTE. Page 174 4052633 – Elective Theory II Data Science and Big Data DIPLOMA IN COMPUTER ENGINEERING SEMESTER PATTERN III YEAR N – SCHEME VI SEMESTER
  • 2. Curriculum Development Centre, DOTE. Page 175 STATE BOARD OF TECHNICAL EDUCATION &TRAINING, TAMILNADU DIPLOMA IN ENGINEERING / TECHNOLOGY SYLLABUS N-SCHEME (To be Implemented for the students admitted from the year 2021 - 2022 onwards) Course Name : 1052:Diploma in Computer Engineering Subject Code : 4052633 Semester : VI Subject title : Elective Theory - II Data Science and Big Data TEACHING AND SCHEME OF EXAMINATION No. of weeks per Semester 16 Weeks Subject Instructions Examination Data Science and Big Data Hours/ Week Hours/ Semester Marks 5 80 Internal Assessment Board Examination Total Duration 25 100 * 100 3 Hrs * Examinations will be conducted for 100 marks and it will be reduced to 75 marks. Topics and Allocation of Hours Unit No. Topic No. of Hours I Introduction to Data Science 15 II Fundamentals of Data Modelling 15 III Fundamentals of Big Data 15 IV Big Data Storage 14 V Big Data Processing 14 Test and Revision 7 Total 80
  • 3. Curriculum Development Centre, DOTE. Page 176 RATIONALE: This course provides a comprehensive understanding of data science and data modeling. The foundation on data science is laid to understand the core concepts and the techniques that underlie today's big data computing technologies. This course helps the students in identifying and applying appropriate techniques and tools to solve problems in managing huge quantities of data. OBJECTIVES: This subject has two major divisions. The objectives of these topics are given below. Data Science After studying the first two units of this syllabus, students will be able ● To understand the fundamentals of data science, various data types, theirsources, problems and issues, various formats of data . ● To apply the Python libraries and Microsoft Excel for Data analysis. ● To work with Microsoft Excel for data analysis and applying various ● functions for data analysis. ● To familiarise with the basic data representation methods. ● To understand the concepts of samples, attributes and their relationships. ● To develop and implement simple linear regression models. ● To understand the concept of model equation and of fit. ● To understand and differentiate the concepts of predictive models and the classification models. ● To familiarize with the concepts of Neural Networks, Decision Trees and Nearest neighbors techniques. Big Data After studying the lessons from Units III to V, the students will be able to ● Get conceptual understanding of Big Data, Web data, classification of data, Big Data characteristics, types, classification and handling techniques. ● Get the conceptual understanding of the impact of ICT developments on Big Data Adoption.
  • 4. Curriculum Development Centre, DOTE. Page 177 ● Understand the Big Data Analytics Life Cycle. ● Get the conceptual understandings of Big Data Storage systems and technologies. ● Understand the concepts of NoSQL databases, their types and characteristics. ● Understand the concepts of Hadoop and its Ecosystem. ● Understand the steps involved in Big data processing like parallel processing, distributed processing and Batch processing. ● Get understanding of MapReduce, map and reduce tasks, MapReduce algorithm. ● Understand the various techniques for Big Data analysis. ● Get introduced to the concepts and types of machine learning techniques. ● Explore the applications of Big Data in different fields. Detailed Syllabus Contents : Theory Unit Name of the Topics Hours I Introduction to Data Science 15 1.1.Data Science - Subfields of Data Science- Data Types-Data 6 Science Road Map- Programming languages for Data Science- Problems with Data- Formatting issues- Python features- Python Technical libraries- Python Arrays and Data Frames. 1.2.Data sources- Data Quality- Consistency and accuracy 4 (Integrity), Noise: Outliers, Missing and Duplicate values- Data Preprocessing using Cleaning, Enrichment, Editing, Reduction, Wrangling- Data Formats: TXT, CSV, XML, JSON, TLV- Loading and Saving files 1.3 Working with Excel: Loading data- Statistical functions- Text 5 Functions- Lookup Functions- Sorting- Filtering- Data Analysis: Correlation, covariance, Descriptive statistics, Regression. II Fundamentals of Data Modelling 15
  • 5. Curriculum Development Centre, DOTE. Page 178 2.1.Linear Algebra: Data representation - Data as a Matrix - Samples and Attributes- Classification of attributes- Concept of Rank-Identify the relationship among attributes 5 2.2.Predictive models: Regression Models - Linear regression - Simple and Multiple Regression-Correlation-Mean squared Error- Testing goodness of fit-Model Equation 5 2.3.Classification models: Two class- Multi class classification- Separability- Performance measures- Terminology- Confusion Matrix-Types (Concepts only): Neural Network- Decision Trees- Nearest Neighbors. 5 III Fundamentals of Big Data 15 3.1Data - Web Data- Classification of Data- Big Data- Characteristics- Volume, Velocity, Variety, Veracity, Value- Need for Big Data- Big Data Types and classifications- Sources of Big Data- Big Data handling techniques-Challenges. 6 3.2 Impact of ICT developments on Big data Adoption: data analytics and data science, digitization, affordable technology and commodity hardware, social media, hyper connected communities and devices, cloud computing and IoT. 4 3.3.Big Data Analytics Life Cycle: Business Case Evaluation, Data Identification, Data Acquisition & Filtering, Data Extraction, Data Validation & Cleansing, Data Aggregation & Representation, Data Analysis, Data Visualization, Utilization of Analysis Results. 5 IV Big Data Storage 14 4.1.Storage Concepts: Clusters, File Systems, Distributed File System, NoSQL, Sharding, Replication, Master Slave, Peer to Peer, CAP Theorem 4
  • 6. Curriculum Development Centre, DOTE. Page 179 4.2. Big Data Storage Technologies: On-Disk Storage Devices- Distributed File system-RDBMS- NoSQL Databases- Characteristics of NoSQL- Types of NoSQL Storage devices. In-Memory storage devices-Data Grids-Databases 5 4.3.Hadoop: Introduction- Hadoop and its Ecosystem: Hadoop core components - Features of Hadoop- Hadoop Ecosystem components- Hadoop streaming- Hadoop pipes- Hadoop distributed File system- HDFS data storage -Hadoop Ecosystem tools. 5 V Big Data Processing 14 5.1.Parallel data processing- Distributed data processing- Hadoop Framework- Processing workloads- cluster for processing- Batch processing with MapReduce- Map and Reduce Tasks- MapReduce algorithms- Processing in Realtime mode- Real time processing and MapReduce. 5 5.2.Big Data Analysis Techniques: Quantitative analysis, Qualitative analysis, Data mining, Statistical analysis: Correlation, regression, Machine Learning: Classification, clustering, outlier detection, filtering. Semantic analysis: Natural language processing, Text Analytics, Sentiment analysis, Visual Analysis 5 5.3.Big Data Analytics Applications and case studies: Big data in Marketing and sales- Big data and Healthcare- Big data in Medicine- Big Data in Advertising. 4 Reference books 1. Field Cady, “The Data Science Handbook”, Wiley, 2017. 2. Jake VanderPlas, “Python Data Science Handbook- Essential tools for working with data”, O’REILLY, 2017 3. Davy Cielen, Arno D. B. Meysman, Mohamed Ali, “Introducing Data Science”, manning publications, 2016 4. Thomas Erl, Wajid Khattak - Big Data Fundamentals Concepts, Drivers & Techniques-Prentice Hall (2016). 5. Raj kamal, Preeti Saxena, “Big Data Analytics-Introduction to Hadoop, Spark and Machine Learning”, McGraw Hill Education(India) Pvt Ltd., 2019.
  • 7. Curriculum Development Centre, DOTE. Page 180 6. Michael Minelli, Michelle Chambers, and Ambiga Dhiraj, "Big Data, Big Analytics: Emerging Business Intelligence and Analytic Trends for Today's Businesses", Wiley, 2013. 7. Tom White, "Hadoop: The Definitive Guide", Third Edition, O'Reilley, 2012. 8. NPTEL MOOC courses on “Data Science” and “Big Data”.