Question Paper Foundations Of Data Sciences, , M.tech, Institute Aeronautical Engineering...

Exam Details

Subject	foundations of data sciences
Paper
Exam / Course	m.tech
Department
Organization	Institute Of Aeronautical Engineering
Position
Exam Date	February, 2017
City, State	telangana, hyderabad

Question Paper

Hall Ticket No Question Paper Code: BCS001
INSTITUTE OF AERONAUTICAL ENGINEERING
(Autonomous)
M.Tech I Semester End Examinations (Regular) February, 2017
Regulation: IARE-R16
FOUNDATIONS OF DATA SCIENCES
(Computer Science and Engineering
Time: 3 Hours Max Marks: 70
Answer ONE Question from each Unit
All Questions Carry Equal Marks
All parts of the question must be answered in one place only
UNIT I
1. Discuss the roles of data scientist and data architects.
W.r.t. bar charts write a code snippet in R for
i. side by side bar chart and faceted bar chart
ii. specifying different styles of bar chart
2. Assume that you have been given customer dataset file consisting of 50000 records of customers
along with their income data. However, during analysis, you may observe that some values are
missing systematically. How do you handle this scenario?
Explain for and while loop in R with syntax and example for each.
UNIT II
3. Assume an employee.xml file containing records of employee including id, name, salary, startdate
and department are stored for n employees. Write a R script to get the number of nodes present
in this XML file. Further, get the details of the first node and get different elements of a node.

What is heteroscedasticity? Explain
4. Write a code snippet in R to create a new empty .xlsx file with one empty sheet named
add day and month to the empty sheet input.
Assume that you have been given the R built in data set mtcars.
We observe that the field "am" represents the type of transmission (auto or manual). It is a
categorical variable with values 0 and 1. The miles per gallon value(mpg) of a car can also
depend on it besides the value of horse power("hp"). Suggest the best way to study the effect of
the value of "am" on the regression between "mpg" and "hp".
Page 1 of 3
Figure 1
UNIT III
5. When do you use Naive Bayes, Support Vector Machines and Decision trees for classification.

Assume that you have been given a wholesale customer database.
Figure 2
There's obviously a big difference for the top customers in each category (e.g. Fresh goes from a
min of 3 to a max of 112,151). Remove the top 5 customers from each category and using custom
functions create a new data set called data.rm.top and using this new data set perform cluster
analysis using k-means.
6. What is model overfitting?
What is association rules? Explain association rules mining with a suitable example.
UNIT IV
7. Do you think that generally, the more pieces of information (i.e., input dimensions) we add,
the greater is the chance that your perceptron can do its task perfectly? Explain your answer,
including a geometrical interpretation vs. two- vs. three dimensional input spaces, and so
on). Under what conditions does additional information help with the classification, and under
what conditions does it not?
We have a function which takes a two-dimensional input x x2) and has two parameters
w w2) given by (x1w1)w2 x2) where 11 e x. We use
backpropagation to estimate the right parameter values. We start by setting both the parameters
to 0. Assume that we are given a training point x1 x2 y 5. Given this information
answer the next two questions.
i. What is the value of
ii. If the learning rate is 0.5, what will be the value of w2 after one update using back propagation
algorithm?
Page 2 of 3
8. The chart below shows a set of two-dimensional input samples from two classes:
Figure 3
It looks like there exists a perfect classification function for this problem that is linearly separable,
and therefore a single perceptron should be able to learn this classification task perfectly. Let
us study the learning process, starting with a random perceptron with weights w0 0.2, w1
and w2 where of course w0 is the weight for the constant offset i0 1. For the inputs,
just estimate their coordinates from the chart. Now add the perceptron's initial line of division
to the chart. How many samples are misclassified? Then pick an arbitrary misclassified sample
and describe the computation of the weight update (you can choose 1 or any other value;
if you like you can experiment a bit to find a value that leads to efficient learning). Illustrate
the perceptron's new line of division and give the number of misclassified samples. Repeat this
process four more times so that you have a total of six lines (or fewer if your perceptron achieves
perfect classification earlier).
W.r.t to question let us assume that less information were available about the samples that
are to be classified. Let us say that we only know the value for i1 for each sample, which means
that our perceptron has only two weights to classify the input as best as possible, i.e., it has
weights w0 and w1, where w0 is once again the weight for the constant offset i0 1. Draw
a diagram that visualizes this one-dimensional classification task, and determine weights for a
perceptron that does the task as best as possible (minimum error, i.e., minimum proportion of
misclassified samples). Where does it separate the input space, and what is its error?
UNIT V
9. Discuss briefly the graphical facilities available in R.
What is knitr? What is its Purpose? Write a code snippet in R to illustrate a simple LaTeX
document with knitr chunks.
10. What is the significance of presenting work of data scientists to his/her peer? Discuss the peer
presentation structure.
Explain briefly the two functions in R for representing multivariate data.

Subjects

ac to dc converters
advanced cad
advanced concrete technology
advanced data structures
advanced database management system
advanced mechanics of solids
advanced reinforced concrete design
advanced solid mechanics
advanced steel design
advanced structural analysis
advanced web technologies
big data analytics
computer aided manufacturing
computer aided process planning
computer architecture
computer oriented numerical methods
cyber security
data science
data structures and problem solving
dc to ac converters
design for manufacturing and assembly
design for manufacturing mems and micro systems
design of hydraulic and pneumatic system
distributed operated system
earthquake resistant design of buildings
embedded c
embedded networking
embedded real time operating systems
embedded system architecture
embedded system design
embedded wireless sensor networks
english for research paper writing
finite element method
flexible ac transmission systems
flexible manufacturing system
foundations of data science
foundations of data sciences
fpga architecture and applications
hardware and software co-design
high performance architecture
intelligent controllers
internet of things
introduction to aerospace engineering
mathematical foundation of computer
mathematical methods in engineering
matrix methods of structural analysis
micro controllers and programmable digital signal processing
multilevel inverters
numerical method for partial differential equations
power electronic control of ac drives
power electronic control of dc drives
power quality
precision engineering
principles of distributed embedded systems
programmable logic controllers and their applications
rapid prototype technologies
rehabilitation and retrofitting of structures
renewable energy systems
research methodology
soft computing
special machines and their controllers
stress analysis and vibration
structural dynamics
structural health monitoring
theory of elasticity and plasticity
theory of thin plates and shells
web intelligent and algorithm
wireless lan’s and pan’s
wireless lans and pans
wireless sensor networks