STAT4001 Data Mining and Statistical Learning Homework 1

$35.00

Download Details:

  • Name: Homework-1-xkdlkm.zip
  • Type: zip
  • Size: 259.89 KB

Category:

Description

5/5 - (1 vote)

1. Consider the simple linear regression setting: T SS =
Pn
i=1(yi − y¯)
2
, RSS =
Pn
i=1(yi −
yˆ)
2
, r =
Pn
i=1(xi−x¯)(yi−y¯)
√Pn
i=1(xi−x¯)
2 Pn
i=1(yi−y¯)
2
.
Derive that R2 =
T SS−RSS
T SS = 1 −
RSS
T SS = r
2
, where r is the correlation between X and
Y (you may refer to lecture notes Chapter 2 page 15).
2. Cross-validation
Consider the following dataset with three observations. Y = (1.2, 1.8, 3.2), X =
(0.4, 0.8, 1.2) and the model Y = β0 + β1X. Calculate the LOOCV (Leave-One-Out
Cross-Validation) error.
3. Bootstrap
Given the following data:
Observation x y
i=1 4.3 2.4
i=2 2.1 1.1
i=3 5.3 2.8
1
Use R ‘sample’ function to generate 3 bootstrap samples and calculate the standard
error (SE) of ˆα. Please do NOT use any packages, write your own codes using the
formula in the lecture notes Chapter 4 page 13∼14.
Remark: Since the sample size is 3, which is very small, you may encounter (1,1,1),
(2,2,2) or (3,3,3) when you generate bootstrap samples. In this case, you will get ”NaN”
in calculating αˆ due to the undefined denominator. You may just discard (1,1,1),
(2,2,2) or (3,3,3) and generate one more sample. However, theoretically speaking, you
can not simply discard the sample. In the real case, the sample size will be larger, and
it will rarely encounter such numerical issue. This question is just a toy example to
illustrate the idea of bootstrap.
4. Newton’s method for finding MLE
Denote the likelihood as L(β0, β1), and the log-likelihood as l(β0, β1).
Given the training data (xi
, yi), i = 1, …, n, consider the logistic regression
P(yi = 1 | xi) = e
β0+β1xi
1 + e
β0+β1xi
, P(yi = 0 | xi) = 1
1 + e
β0+β1xi
We can rewrite this model into
P(yi
| xi) = e
(β0+β1xi)yi
1 + e
β0+β1xi
, yi = 0, 1
Now we assume that β0 = c0 is known , our goal is to find the MLE for β1 by finding
the root of l
0
(β1) = 0.
(a) Write down the likelihood L(β1), log-likelihood l(β1), l
0
(β1), l
00(β1) and the formula
for updating β
(t+1)
1 given β
(t)
1 using Newton’s method.
(b) Write R code to implement it. Stop the iteration if the difference between two
successive likelihood is less than a cutoff (i.e. |L(β
(t+1)
1
) − L(β
(t)
1
)| ≤ cutoff).
(Please do NOT use any packages in R for logistic regression or Newton’s method,
write your own code.)
Hint:
• In practice, we will rewrite e
β0+β1xi
1+e
β0+β1xi
as 1
1+e−(β0+β1xi
)
to avoid ∞
∞, please do
the similar adjustments when writing your own code.
• Use if-else conditional statement and function ‘break’ to stop the iteration.
• Use functions ‘list’ and ‘save’ to save multiple outputs.
(c) Fix β0 = −0.66, set cutoff = 10−14, run your code using dataset ‘HW1Q4data1.Rdata’.
Report the trace of βˆ
1 and the trace of the log-likelihood.
(d) Fix β0 = 0, set cutoff = 10−4
, run your code using dataset ‘HW1Q4data2.Rdata’
(this dataset is well-seperated) to see βˆ
1 diverges. Report the trace of βˆ
1 and the
trace of the log-likelihood.
2
(e) Nonuniqueness of solution βˆ
1
Consider the following model, with c is known
P(yi
| xi) = e
β1(xi+c)yi
1 + e
β1(xi+c)
, yi = 0, 1
i. Write down the likelihood L(β1), log-likelihood l(β1), l
0
(β1), l
00(β1) and the
formula for updating β
(t+1)
1 given β
(t)
1 using Newton’s method.
ii. Write R code to implement it. Stop the iteration when the difference between
two successive likelihood is less than a cutoff (i.e. |L(β
(t+1)
1
) − L(β
(t)
1
)| ≤
cutoff). (Please do NOT use any packages in R for logistic regression or
Newton’s method, write your own code.)
iii. Fix c = 0.5, set cutoff = 10−8
, run your code using dataset ‘HW1Q4data2.Rdata’
to get βˆ
1. Report the trace of βˆ
1 and the trace of the log-likelihood.
(f) Write down the decision boundaries for (d) and (e)(iii).
Remark: The model in (e) is equivalent to the model in (a) with β0 = cβ1. The
model in (d) and the model in (e) give the same likelihood, but different decision
boundaries. Hence, for well-seperated dataset, the parameter estimates for logistic
regression may not be unique, and the classification performance will be affected
by this numerical issue.
5. Implement K-Nearest Neighborhood through R
(a) Given a dataset Xn×p (n observations, p features), Yn×1 (class labels, yi = 0, 1),
write a function to classify one new point xnew (i.e. to predict ynew). (Please do
NOT use any packages in R, write your own code.)
Hint:
i. Find all the distances between xnew and Xn×p
ii. Pick up class labels of the nearst K points using ‘rank()’
iii. Predict ynew by majority vote
(b) Run your function using dataset ‘HW1Q5data.Rdata’ to classify xnew, with K =
8.
6. Use the ‘Weekly data set’ in textbook ‘An Introduction to Statistical Learning, with
Applications in R’ page 171, Question 10 to implement the following. Show the estimated parameters and compare the results of different methods through the predicted
error.
The data set is called ‘W eekly’ in R, and it is in the ISLR package. Please first install
the package ‘ISLR’, and then run the code ‘library(ISLR)’ to load the data.
(a) Logistic Regression
3
(b) Linear Discriminant Analysis
(c) Quadratic Discriminant Analysis
(d) K-Nearest Neighbors
Hint: You may refer to the tutorial notes ‘Tutorial 02’, or just follow the instructions
in Question 10. If you have your own thoughts and ideas, or explore some new areas about these four methods, you are highly welcomed and encouraged to include new
analysis!
– End –