Homework 1 Riddhi

.pdf

School

University of California, Berkeley *

*We aren’t endorsed by this school

Course

C249

Subject

Statistics

Date

Jan 9, 2024

Type

pdf

Pages

12

Uploaded by ColonelJellyfish3706 on coursehero.com

PH240C Homework 1 Riddhi Sera 2023-10-20 Question 1 A logistic model with 2 variates: log ( π 1 π ) = β o + β 1 X 1 + β 2 X 2 1(a) We can rewrite the above formula as: π = exp ( β o + β 1 X 1 + β 2 X 2 ) 1 + exp ( β o + β 1 X 1 + β 2 X 2 ) So we can substitute the values to get: b0 <- - 4 b1 <- 0.05 b2 <- 1 x1 <- 5 x2 <- 3.5 answer <- exp (b0 + b1 * x1 + b2 * x2) / ( 1 + exp (b0 + b1 * x1 + b2 * x2)) cat ( "Probability of this student getting an A is" , answer) ## Probability of this student getting an A is 0.4378235 1(b) We can rewrite the above formula as: odds = p 1 p = exp ( β o + β 1 X 1 + β 2 X 2 ) So we can substitute the values to get: b0 <- - 4 b1 <- 0.05 b2 <- 1 x1 <- 5 x2 <- 3.5 answer <- exp (b0 + b1 * x1 + b2 * x2) cat ( "Odds of this student getting an A is" , answer) ## Odds of this student getting an A is 0.7788008 1(c) 50% chance of getting an A implies that p = 0.5, which means: log ( π 1 π ) = log ( 0 . 5 1 0 . 5 ) = 0 1
We can rewrite the above formula as: X 1 = log ( π 1 π ) β o β 2 X 2 β 1 = β o β 2 X 2 β 1 So we can substitute the values to get: b0 <- - 4 b1 <- 0.05 b2 <- 1 x2 <- 3.5 answer <- - (b0 + b2 * x2) / b1 cat ( "Number of hours a week this student needs to study is" , answer, "hours" ) ## Number of hours a week this student needs to study is 10 hours Question 2 Necessary libraries: ##libraries for data cleaning library (data.table) library (dplyr) ## ## Attaching package: ' dplyr ' ## The following objects are masked from ' package:data.table ' : ## ## between, first, last ## The following objects are masked from ' package:stats ' : ## ## filter, lag ## The following objects are masked from ' package:base ' : ## ## intersect, setdiff, setequal, union library (purrr) ## ## Attaching package: ' purrr ' ## The following object is masked from ' package:data.table ' : ## ## transpose library (janitor) ## ## Attaching package: ' janitor ' ## The following objects are masked from ' package:stats ' : ## ## chisq.test, fisher.test 2
##libraries for SVM function library (e1071) library (caret) ## Loading required package: ggplot2 ## Loading required package: lattice ## ## Attaching package: ' caret ' ## The following object is masked from ' package:purrr ' : ## ## lift library (glmnet) ## Loading required package: Matrix ## Loaded glmnet 4.1-8 library (mlbench) library (pROC) ## Type ' citation("pROC") ' for a citation. ## ## Attaching package: ' pROC ' ## The following objects are masked from ' package:stats ' : ## ## cov, smooth, var ##libraries for decision tree library (rpart) library (rpart.plot) ##libraries for random forest library (randomForest) ## randomForest 4.7-1.1 ## Type rfNews() to see new features/changes/bug fixes. ## ## Attaching package: ' randomForest ' ## The following object is masked from ' package:ggplot2 ' : ## ## margin ## The following object is masked from ' package:dplyr ' : ## ## combine Code to clean data: ## Data pre-processing data <- read.csv ( ' Downloads/heart_disease.csv ' ) df <- data[ , - 1 ] df[df == "?" ] <- NA df <- na.omit (df) 3
## Converting numerical variable to factors factors <- c ( "Chest_Pain_Type" , "Fasting_Blood_Sugar" , "Resting_ECG" , "Exercise_Induced_Angina" , "Peak_Exercise_ST_Segment" , "Thalassemia" , "Diagnosis_Heart_Disease" ) df[factors] <- lapply (df[factors], factor) 2(a) I have chosen the linear and the polynomial kernels to train the SVM classifiers. We can evaluate the classifiers’ performance by looking at the testing accuracy, the ROC curve and the AUC. The classifier with a higher accuracy is usually preferred and here it is the linear classifier. set.seed ( 1 ) #to replicate the same results ## Selecting our input and output space train_index <- createDataPartition ( y = df $ Diagnosis_Heart_Disease, p = 0.75 , list = FALSE ) X_train <- df[train_index, - 14 ] y_train <- df[train_index, 14 ] X_test <- df[ - train_index, - 14 ] y_test <- df[ - train_index, 14 ] # Linear svm_linear <- svm (y_train ~ ., data = X_train, kernel = "linear" ) predictions_linear <- predict (svm_linear, X_test) accuracy_linear <- sum (predictions_linear == y_test) / length (y_test) pred_num_linear <- as.numeric ( as.character (predictions_linear)) y_num_test <- as.numeric ( as.character (y_test)) roc_linear <- roc (y_num_test, pred_num_linear) ## Setting levels: control = 0, case = 1 ## Setting direction: controls < cases auc_linear <- auc (roc_linear) # Polynomial svm_poly <- svm (y_train ~ ., data = X_train, kernel = "polynomial" ) predictions_poly <- predict (svm_poly, X_test) accuracy_poly <- sum (predictions_poly == y_test) / length (y_test) pred_num_poly <- as.numeric ( as.character (predictions_poly)) roc_poly <- roc (y_num_test, pred_num_poly) ## Setting levels: control = 0, case = 1 ## Setting direction: controls < cases auc_poly <- auc (roc_poly) # Accuracy for both classifiers print ( paste ( "Accuracy (Linear Kernel):" , accuracy_linear)) ## [1] "Accuracy (Linear Kernel): 0.810810810810811" print ( paste ( "Accuracy (Polynomial Kernel):" , accuracy_poly)) ## [1] "Accuracy (Polynomial Kernel): 0.702702702702703" 4
# AUC for both classifiers print ( paste ( "AUC (Linear Kernel):" , auc_linear)) ## [1] "AUC (Linear Kernel): 0.811764705882353" print ( paste ( "AUC (Polynomial Kernel):" , auc_poly)) ## [1] "AUC (Polynomial Kernel): 0.676470588235294" # Plot ROC curves plot (roc_linear, col = "blue" , main = "ROC Curves" ) lines (roc_poly, col = "green" ) legend ( "bottomright" , legend = c ( "Linear Kernel" , "Polynomial Kernel" ), fill = c ( "blue" , "green" )) ROC Curves Specificity Sensitivity 1.0 0.5 0.0 0.0 0.2 0.4 0.6 0.8 1.0 Linear Kernel Polynomial Kernel 5
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help