CECS 551 Programming Assignment 1

$30.00

Download Details:

  • Name: Assignment1-mtc2x5.zip
  • Type: zip
  • Size: 172.67 KB

Category:

Description

5/5 - (1 vote)

Mammographic Mass Data Set
Review and download the mammographic mass data set at
https://archive.ics.uci.edu/ml/datasets/Mammographic+Mass
In examining the data, notice that some datapoints have missing attribute values. In these cases “?”
is substituted for each missing value. Add a header line to the csv file that labels the attributes as
Birads,Age,Shape,Margin,Density, and Severity.
Exercises
1. In R the more appropriate indicator for missing data is “NA” (not available). Therefore, replace
each occurrence of “?” with “NA”.
a. For this exercise, create an R data frame for the mammographic data using only datapoints
that have no missing values. This can be done using the complete.cases function which
inputs a data frame and returns a Boolean vector v, where v[i] equals TRUE iff the i th
data-frame sample is complete (meaning it does not possess an NA). For example, if the
data-frame is stored in mammogram.frame, then
mammogram2.frame = mammogram.frame[complete.cases(mammogram.frame),]
creates a new data frame called mammogram2.frame that has all the complete mammogram
data samples.
b. Use R’s summary function to provide a statistical summary of each of attribute of the
altered data frame.
c. Use the e1071 svm function to construct a linear classifier for the data set. Report on the
percentage of datapoints that are correctly classified by the svm model.
d. Repeat part b) using a degree-2 polynomial classifier. This particular type of svm can be
constructed using the input options
kernel = ‘‘polynomial’’, degree = 2, type = ‘‘C-classification’’
2. Repeat each part of the previous exercise, but, for Part a, instead of removing datapoints with
missing attribute values, replace each NA with the nominal value -1 and use the entire data
set.