Exam Details
Subject | foundations of data sciences | |
Paper | ||
Exam / Course | m.tech | |
Department | ||
Organization | Institute Of Aeronautical Engineering | |
Position | ||
Exam Date | February, 2017 | |
City, State | telangana, hyderabad |
Question Paper
Hall Ticket No Question Paper Code: BCS001
INSTITUTE OF AERONAUTICAL ENGINEERING
(Autonomous)
M.Tech I Semester End Examinations (Regular) February, 2017
Regulation: IARE-R16
FOUNDATIONS OF DATA SCIENCES
(Computer Science and Engineering
Time: 3 Hours Max Marks: 70
Answer ONE Question from each Unit
All Questions Carry Equal Marks
All parts of the question must be answered in one place only
UNIT I
1. Discuss the roles of data scientist and data architects.
W.r.t. bar charts write a code snippet in R for
i. side by side bar chart and faceted bar chart
ii. specifying different styles of bar chart
2. Assume that you have been given customer dataset file consisting of 50000 records of customers
along with their income data. However, during analysis, you may observe that some values are
missing systematically. How do you handle this scenario?
Explain for and while loop in R with syntax and example for each.
UNIT II
3. Assume an employee.xml file containing records of employee including id, name, salary, startdate
and department are stored for n employees. Write a R script to get the number of nodes present
in this XML file. Further, get the details of the first node and get different elements of a node.
What is heteroscedasticity? Explain
4. Write a code snippet in R to create a new empty .xlsx file with one empty sheet named
add day and month to the empty sheet input.
Assume that you have been given the R built in data set mtcars.
We observe that the field "am" represents the type of transmission (auto or manual). It is a
categorical variable with values 0 and 1. The miles per gallon value(mpg) of a car can also
depend on it besides the value of horse power("hp"). Suggest the best way to study the effect of
the value of "am" on the regression between "mpg" and "hp".
Page 1 of 3
Figure 1
UNIT III
5. When do you use Naive Bayes, Support Vector Machines and Decision trees for classification.
Assume that you have been given a wholesale customer database.
Figure 2
There's obviously a big difference for the top customers in each category (e.g. Fresh goes from a
min of 3 to a max of 112,151). Remove the top 5 customers from each category and using custom
functions create a new data set called data.rm.top and using this new data set perform cluster
analysis using k-means.
6. What is model overfitting?
What is association rules? Explain association rules mining with a suitable example.
UNIT IV
7. Do you think that generally, the more pieces of information (i.e., input dimensions) we add,
the greater is the chance that your perceptron can do its task perfectly? Explain your answer,
including a geometrical interpretation vs. two- vs. three dimensional input spaces, and so
on). Under what conditions does additional information help with the classification, and under
what conditions does it not?
We have a function which takes a two-dimensional input x x2) and has two parameters
w w2) given by (x1w1)w2 x2) where 11 e x. We use
backpropagation to estimate the right parameter values. We start by setting both the parameters
to 0. Assume that we are given a training point x1 x2 y 5. Given this information
answer the next two questions.
i. What is the value of
ii. If the learning rate is 0.5, what will be the value of w2 after one update using back propagation
algorithm?
Page 2 of 3
8. The chart below shows a set of two-dimensional input samples from two classes:
Figure 3
It looks like there exists a perfect classification function for this problem that is linearly separable,
and therefore a single perceptron should be able to learn this classification task perfectly. Let
us study the learning process, starting with a random perceptron with weights w0 0.2, w1
and w2 where of course w0 is the weight for the constant offset i0 1. For the inputs,
just estimate their coordinates from the chart. Now add the perceptron's initial line of division
to the chart. How many samples are misclassified? Then pick an arbitrary misclassified sample
and describe the computation of the weight update (you can choose 1 or any other value;
if you like you can experiment a bit to find a value that leads to efficient learning). Illustrate
the perceptron's new line of division and give the number of misclassified samples. Repeat this
process four more times so that you have a total of six lines (or fewer if your perceptron achieves
perfect classification earlier).
W.r.t to question let us assume that less information were available about the samples that
are to be classified. Let us say that we only know the value for i1 for each sample, which means
that our perceptron has only two weights to classify the input as best as possible, i.e., it has
weights w0 and w1, where w0 is once again the weight for the constant offset i0 1. Draw
a diagram that visualizes this one-dimensional classification task, and determine weights for a
perceptron that does the task as best as possible (minimum error, i.e., minimum proportion of
misclassified samples). Where does it separate the input space, and what is its error?
UNIT V
9. Discuss briefly the graphical facilities available in R.
What is knitr? What is its Purpose? Write a code snippet in R to illustrate a simple LaTeX
document with knitr chunks.
10. What is the significance of presenting work of data scientists to his/her peer? Discuss the peer
presentation structure.
Explain briefly the two functions in R for representing multivariate data.
INSTITUTE OF AERONAUTICAL ENGINEERING
(Autonomous)
M.Tech I Semester End Examinations (Regular) February, 2017
Regulation: IARE-R16
FOUNDATIONS OF DATA SCIENCES
(Computer Science and Engineering
Time: 3 Hours Max Marks: 70
Answer ONE Question from each Unit
All Questions Carry Equal Marks
All parts of the question must be answered in one place only
UNIT I
1. Discuss the roles of data scientist and data architects.
W.r.t. bar charts write a code snippet in R for
i. side by side bar chart and faceted bar chart
ii. specifying different styles of bar chart
2. Assume that you have been given customer dataset file consisting of 50000 records of customers
along with their income data. However, during analysis, you may observe that some values are
missing systematically. How do you handle this scenario?
Explain for and while loop in R with syntax and example for each.
UNIT II
3. Assume an employee.xml file containing records of employee including id, name, salary, startdate
and department are stored for n employees. Write a R script to get the number of nodes present
in this XML file. Further, get the details of the first node and get different elements of a node.
What is heteroscedasticity? Explain
4. Write a code snippet in R to create a new empty .xlsx file with one empty sheet named
add day and month to the empty sheet input.
Assume that you have been given the R built in data set mtcars.
We observe that the field "am" represents the type of transmission (auto or manual). It is a
categorical variable with values 0 and 1. The miles per gallon value(mpg) of a car can also
depend on it besides the value of horse power("hp"). Suggest the best way to study the effect of
the value of "am" on the regression between "mpg" and "hp".
Page 1 of 3
Figure 1
UNIT III
5. When do you use Naive Bayes, Support Vector Machines and Decision trees for classification.
Assume that you have been given a wholesale customer database.
Figure 2
There's obviously a big difference for the top customers in each category (e.g. Fresh goes from a
min of 3 to a max of 112,151). Remove the top 5 customers from each category and using custom
functions create a new data set called data.rm.top and using this new data set perform cluster
analysis using k-means.
6. What is model overfitting?
What is association rules? Explain association rules mining with a suitable example.
UNIT IV
7. Do you think that generally, the more pieces of information (i.e., input dimensions) we add,
the greater is the chance that your perceptron can do its task perfectly? Explain your answer,
including a geometrical interpretation vs. two- vs. three dimensional input spaces, and so
on). Under what conditions does additional information help with the classification, and under
what conditions does it not?
We have a function which takes a two-dimensional input x x2) and has two parameters
w w2) given by (x1w1)w2 x2) where 11 e x. We use
backpropagation to estimate the right parameter values. We start by setting both the parameters
to 0. Assume that we are given a training point x1 x2 y 5. Given this information
answer the next two questions.
i. What is the value of
ii. If the learning rate is 0.5, what will be the value of w2 after one update using back propagation
algorithm?
Page 2 of 3
8. The chart below shows a set of two-dimensional input samples from two classes:
Figure 3
It looks like there exists a perfect classification function for this problem that is linearly separable,
and therefore a single perceptron should be able to learn this classification task perfectly. Let
us study the learning process, starting with a random perceptron with weights w0 0.2, w1
and w2 where of course w0 is the weight for the constant offset i0 1. For the inputs,
just estimate their coordinates from the chart. Now add the perceptron's initial line of division
to the chart. How many samples are misclassified? Then pick an arbitrary misclassified sample
and describe the computation of the weight update (you can choose 1 or any other value;
if you like you can experiment a bit to find a value that leads to efficient learning). Illustrate
the perceptron's new line of division and give the number of misclassified samples. Repeat this
process four more times so that you have a total of six lines (or fewer if your perceptron achieves
perfect classification earlier).
W.r.t to question let us assume that less information were available about the samples that
are to be classified. Let us say that we only know the value for i1 for each sample, which means
that our perceptron has only two weights to classify the input as best as possible, i.e., it has
weights w0 and w1, where w0 is once again the weight for the constant offset i0 1. Draw
a diagram that visualizes this one-dimensional classification task, and determine weights for a
perceptron that does the task as best as possible (minimum error, i.e., minimum proportion of
misclassified samples). Where does it separate the input space, and what is its error?
UNIT V
9. Discuss briefly the graphical facilities available in R.
What is knitr? What is its Purpose? Write a code snippet in R to illustrate a simple LaTeX
document with knitr chunks.
10. What is the significance of presenting work of data scientists to his/her peer? Discuss the peer
presentation structure.
Explain briefly the two functions in R for representing multivariate data.
Other Question Papers
Subjects
- ac to dc converters
- advanced cad
- advanced concrete technology
- advanced data structures
- advanced database management system
- advanced mechanics of solids
- advanced reinforced concrete design
- advanced solid mechanics
- advanced steel design
- advanced structural analysis
- advanced web technologies
- big data analytics
- computer aided manufacturing
- computer aided process planning
- computer architecture
- computer oriented numerical methods
- cyber security
- data science
- data structures and problem solving
- dc to ac converters
- design for manufacturing and assembly
- design for manufacturing mems and micro systems
- design of hydraulic and pneumatic system
- distributed operated system
- earthquake resistant design of buildings
- embedded c
- embedded networking
- embedded real time operating systems
- embedded system architecture
- embedded system design
- embedded wireless sensor networks
- english for research paper writing
- finite element method
- flexible ac transmission systems
- flexible manufacturing system
- foundations of data science
- foundations of data sciences
- fpga architecture and applications
- hardware and software co-design
- high performance architecture
- intelligent controllers
- internet of things
- introduction to aerospace engineering
- mathematical foundation of computer
- mathematical methods in engineering
- matrix methods of structural analysis
- micro controllers and programmable digital signal processing
- multilevel inverters
- numerical method for partial differential equations
- power electronic control of ac drives
- power electronic control of dc drives
- power quality
- precision engineering
- principles of distributed embedded systems
- programmable logic controllers and their applications
- rapid prototype technologies
- rehabilitation and retrofitting of structures
- renewable energy systems
- research methodology
- soft computing
- special machines and their controllers
- stress analysis and vibration
- structural dynamics
- structural health monitoring
- theory of elasticity and plasticity
- theory of thin plates and shells
- web intelligent and algorithm
- wireless lan’s and pan’s
- wireless lans and pans
- wireless sensor networks