Exam Details
Subject | data mining and data warehousing | |
Paper | ||
Exam / Course | b.tech | |
Department | ||
Organization | Vardhaman College Of Engineering | |
Position | ||
Exam Date | May, 2018 | |
City, State | telangana, hyderabad |
Question Paper
(AUTONOMOUS)
B. Tech VII Semester Supplementary Examinations, May 2018
(Regulations: VCE-R14)
DATA MINING AND DATA WAREHOUSING
(Computer Science and Engineering)
Date: 21 May, 2018 AN Time: 3 hours Max Marks: 75
Answer ONE question from each Unit
All Questions Carry Equal Marks
Unit I
1. What are the major challenges of mining a huge amount of data (such as billions of
tuples) in comparison with mining a small amount of data (such as a few hundred tuple
data set).
8M
Illustrate the role of data mining as a confluence of multiple disciplines.
7M
2. Describe why concept hierarchies are useful in data mining. 7M
Give the differences between classification and prediction.
8M
Unit II
3. Suppose that we need to record three measures in a data cube: min, average, and
median. Design an efficient computation and storage method for each measure given
that the cube allows data to be deleted incrementally (that is in small portions at a time)
from the cube.
10M
Often, the aggregate measure value of many cells in a large data cuboid is zero, resulting
in a huge, yet sparse, multidimensional matrix. Design an implementation method that
can elegantly overcome this sparse matrix problem. Note that you need to explain your
data structures in detail and discuss the space needed, as well as how to retrieve data
from your structures.
5M
4. Explain in brief about data generalization. 5M
Draw the star schema for T20 cricket taking into account the spectator, location, game,
date for the centralized sales table. Starting with the base cuboid [date, spectator,
location, game] what specific OLAP operations should one perform in order to get the
total charges paid by Anna Pavilion spectators at Chidambaram Stadium in 2012?
10M
Unit III
5. Describe Frequent Pattern Mining with Vertical Data Format by means of an example. 5M
Some transactions in Big Bazaar are as follows:
TID List of items
T100 Colour TV, DVD, Speakers
T200 DVD, Laptop
T300 DVD, Desktop
T400 Colour TV, DVD, Laptop
T500 Colour TV, DVD, Desktop
The minimum support 30%. Using C for Colour TV, D for DVD, DT for Desktop, L for
Laptop and S for Speakers draw the FP Tree and corresponding conditional pattern bases
along with conditional FP Tree.
10M
6. Suppose that frequent itemsets are saved for a large transaction database, DB. Explain
how to efficiently mine the (global) association rules under the same minimum support
threshold if a set of new transactions, denoted as is (incrementally) added in?
6M
Association rule mining often generates a large number of rules. Explain effective
methods that can be used to reduce the number of rules generated while still preserving
most of the interesting rules.
9M
Cont…2
Unit IV
7. Why is tree pruning useful in decision tree induction? What is a drawback of using a
separate set of tuples to evaluate pruning?
10M
Given a decision tree, you have the option of:
i. Converting the decision tree to rules and then pruning the resulting rules, or
ii. Pruning the decision tree and then converting the pruned tree to rules. What
advantage does have over
5M
8. Illustrate the pre-processing steps that may be applied to the data to help improve the
accuracy, efficiency, and scalability of the classification process.
9M
Write an algorithm for k-nearest neighbor classification given k and the number of
attributes describing each tuple.
6M
Unit V
9. Describe the Grid-based and Model-based Clustering Methods. 8M
Explain with an example about the deviation-based outlier detection.
7M
10. How partitioning methods are useful in clustering? Describe any one classical
partitioning method.
7M
Explain with an example about Distance-Based Outlier Detection. 8M
B. Tech VII Semester Supplementary Examinations, May 2018
(Regulations: VCE-R14)
DATA MINING AND DATA WAREHOUSING
(Computer Science and Engineering)
Date: 21 May, 2018 AN Time: 3 hours Max Marks: 75
Answer ONE question from each Unit
All Questions Carry Equal Marks
Unit I
1. What are the major challenges of mining a huge amount of data (such as billions of
tuples) in comparison with mining a small amount of data (such as a few hundred tuple
data set).
8M
Illustrate the role of data mining as a confluence of multiple disciplines.
7M
2. Describe why concept hierarchies are useful in data mining. 7M
Give the differences between classification and prediction.
8M
Unit II
3. Suppose that we need to record three measures in a data cube: min, average, and
median. Design an efficient computation and storage method for each measure given
that the cube allows data to be deleted incrementally (that is in small portions at a time)
from the cube.
10M
Often, the aggregate measure value of many cells in a large data cuboid is zero, resulting
in a huge, yet sparse, multidimensional matrix. Design an implementation method that
can elegantly overcome this sparse matrix problem. Note that you need to explain your
data structures in detail and discuss the space needed, as well as how to retrieve data
from your structures.
5M
4. Explain in brief about data generalization. 5M
Draw the star schema for T20 cricket taking into account the spectator, location, game,
date for the centralized sales table. Starting with the base cuboid [date, spectator,
location, game] what specific OLAP operations should one perform in order to get the
total charges paid by Anna Pavilion spectators at Chidambaram Stadium in 2012?
10M
Unit III
5. Describe Frequent Pattern Mining with Vertical Data Format by means of an example. 5M
Some transactions in Big Bazaar are as follows:
TID List of items
T100 Colour TV, DVD, Speakers
T200 DVD, Laptop
T300 DVD, Desktop
T400 Colour TV, DVD, Laptop
T500 Colour TV, DVD, Desktop
The minimum support 30%. Using C for Colour TV, D for DVD, DT for Desktop, L for
Laptop and S for Speakers draw the FP Tree and corresponding conditional pattern bases
along with conditional FP Tree.
10M
6. Suppose that frequent itemsets are saved for a large transaction database, DB. Explain
how to efficiently mine the (global) association rules under the same minimum support
threshold if a set of new transactions, denoted as is (incrementally) added in?
6M
Association rule mining often generates a large number of rules. Explain effective
methods that can be used to reduce the number of rules generated while still preserving
most of the interesting rules.
9M
Cont…2
Unit IV
7. Why is tree pruning useful in decision tree induction? What is a drawback of using a
separate set of tuples to evaluate pruning?
10M
Given a decision tree, you have the option of:
i. Converting the decision tree to rules and then pruning the resulting rules, or
ii. Pruning the decision tree and then converting the pruned tree to rules. What
advantage does have over
5M
8. Illustrate the pre-processing steps that may be applied to the data to help improve the
accuracy, efficiency, and scalability of the classification process.
9M
Write an algorithm for k-nearest neighbor classification given k and the number of
attributes describing each tuple.
6M
Unit V
9. Describe the Grid-based and Model-based Clustering Methods. 8M
Explain with an example about the deviation-based outlier detection.
7M
10. How partitioning methods are useful in clustering? Describe any one classical
partitioning method.
7M
Explain with an example about Distance-Based Outlier Detection. 8M
Other Question Papers
Subjects
- advanced computer networks
- advanced database management systems
- advanced digital signal processing
- advanced structural design
- air line management
- air pollution and control methodologies
- aircraft systems and instrumentation
- analog communications
- artificial intelligence
- automobile engineering
- basic electrical engineering
- basic mechanical engineering
- cad/cam
- cellular and mobile comunications
- cloud computing
- coding theory and techniques
- compiler design
- computational fluid dynamics
- computer architecture and parallel processing
- computer graphics
- computer graphics concepts
- computer networks
- computer organization and architecture
- computer programming
- computer vision and pattern recognition
- concrete technology
- control systems
- cyber security
- data mining and data warehousing
- database management systems
- design and drawing of hydraulic structures
- design for testability
- digital image processing
- distributed databases
- distributed operating systems
- electrical machines-ii
- electromagnetics and transmission lines
- electronic measurements and instrumentation
- embedded netwrok and protocols
- embedded software design
- embedded systems
- engineering drawing-i
- engineering mechanics-i
- engineering physics
- entrepreneurship
- environmental engineering-ii
- environmental science
- finite elements methods in civil engineering
- flexible ac transmission systems
- formal language and automata theory
- grid and cloud computing
- hardware software co-design
- heat transfer
- high voltage engineering
- hydraulic machines
- hydraulics and hydraulic machines
- image processing
- image processing and pattern recognition
- industrial management and psychology
- information retrieval systems
- instrumentation and control systems
- kinematics of machinery
- low power cmos vlsi design
- managerial economics and financial analysis
- microwave engineering
- mobile application development through j2me
- national service scheme
- network security and cryptography
- operating systems
- operations research
- pavement analysis and design
- planning and drawing
- power electronic control of ac drives
- power electronic converters-ii
- power semiconductor drives
- power system generation
- power system switchgear and protection
- principles of electrical engineering
- principles of programming languages
- probability theory and numerical methods
- production technology-i
- programmable logic controllers and applications
- project planning and management
- pulse and digital circuits
- reactive power compensation and management
- refrigeration and air conditioning
- rehabilitation and retrofitting structures
- reliability engineering
- renewable energy sources
- robotics and automation
- satellite and radar communications
- service oriented architecture
- signals and systems
- software architecture
- software engineering
- software project management
- software testing and quality assurance
- speech signal processing
- strength of materials-iibuilding
- structural analysis-i
- surveying-ii
- technical english
- thermal engineering-i
- utilization of electrical energy
- vlsi design
- web technologies
- wireless and mobile computing
- wireless communications and networks