Homework 1 Riddhi

.pdf

School

University of California, Berkeley *

*We aren’t endorsed by this school

Course

C249

Subject

Statistics

Date

Jan 9, 2024

Type

pdf

Pages

Uploaded by ColonelJellyfish3706 on coursehero.com

PH240C Homework 1 Riddhi Sera 2023-10-20 Question 1 A logistic model with 2 variates: log ( π 1 − π ) = β o + β 1 X 1 + β 2 X 2 1(a) We can rewrite the above formula as: π = exp ( β o + β 1 X 1 + β 2 X 2 ) 1 + exp ( β o + β 1 X 1 + β 2 X 2 ) So we can substitute the values to get: b0 <- - 4 b1 <- 0.05 b2 <- 1 x1 <- 5 x2 <- 3.5 answer <- exp (b0 + b1 * x1 + b2 * x2) / ( 1 + exp (b0 + b1 * x1 + b2 * x2)) cat ( "Probability of this student getting an A is" , answer) ## Probability of this student getting an A is 0.4378235 1(b) We can rewrite the above formula as: odds = p 1 − p = exp ( β o + β 1 X 1 + β 2 X 2 ) So we can substitute the values to get: b0 <- - 4 b1 <- 0.05 b2 <- 1 x1 <- 5 x2 <- 3.5 answer <- exp (b0 + b1 * x1 + b2 * x2) cat ( "Odds of this student getting an A is" , answer) ## Odds of this student getting an A is 0.7788008 1(c) 50% chance of getting an A implies that p = 0.5, which means: log ( π 1 − π ) = log ( 0 . 5 1 − 0 . 5 ) = 0 1

We can rewrite the above formula as: X 1 = log ( π 1 − π ) − β o − β 2 X 2 β 1 = − β o − β 2 X 2 β 1 So we can substitute the values to get: b0 <- - 4 b1 <- 0.05 b2 <- 1 x2 <- 3.5 answer <- - (b0 + b2 * x2) / b1 cat ( "Number of hours a week this student needs to study is" , answer, "hours" ) ## Number of hours a week this student needs to study is 10 hours Question 2 Necessary libraries: ##libraries for data cleaning library (data.table) library (dplyr) ## ## Attaching package: ' dplyr ' ## The following objects are masked from ' package:data.table ' : ## ## between, first, last ## The following objects are masked from ' package:stats ' : ## ## filter, lag ## The following objects are masked from ' package:base ' : ## ## intersect, setdiff, setequal, union library (purrr) ## ## Attaching package: ' purrr ' ## The following object is masked from ' package:data.table ' : ## ## transpose library (janitor) ## ## Attaching package: ' janitor ' ## The following objects are masked from ' package:stats ' : ## ## chisq.test, fisher.test 2

##libraries for SVM function library (e1071) library (caret) ## Loading required package: ggplot2 ## Loading required package: lattice ## ## Attaching package: ' caret ' ## The following object is masked from ' package:purrr ' : ## ## lift library (glmnet) ## Loading required package: Matrix ## Loaded glmnet 4.1-8 library (mlbench) library (pROC) ## Type ' citation("pROC") ' for a citation. ## ## Attaching package: ' pROC ' ## The following objects are masked from ' package:stats ' : ## ## cov, smooth, var ##libraries for decision tree library (rpart) library (rpart.plot) ##libraries for random forest library (randomForest) ## randomForest 4.7-1.1 ## Type rfNews() to see new features/changes/bug fixes. ## ## Attaching package: ' randomForest ' ## The following object is masked from ' package:ggplot2 ' : ## ## margin ## The following object is masked from ' package:dplyr ' : ## ## combine Code to clean data: ## Data pre-processing data <- read.csv ( ' Downloads/heart_disease.csv ' ) df <- data[ , - 1 ] df[df == "?" ] <- NA df <- na.omit (df) 3

## Converting numerical variable to factors factors <- c ( "Chest_Pain_Type" , "Fasting_Blood_Sugar" , "Resting_ECG" , "Exercise_Induced_Angina" , "Peak_Exercise_ST_Segment" , "Thalassemia" , "Diagnosis_Heart_Disease" ) df[factors] <- lapply (df[factors], factor) 2(a) I have chosen the linear and the polynomial kernels to train the SVM classifiers. We can evaluate the classifiers’ performance by looking at the testing accuracy, the ROC curve and the AUC. The classifier with a higher accuracy is usually preferred and here it is the linear classifier. set.seed ( 1 ) #to replicate the same results ## Selecting our input and output space train_index <- createDataPartition ( y = df $ Diagnosis_Heart_Disease, p = 0.75 , list = FALSE ) X_train <- df[train_index, - 14 ] y_train <- df[train_index, 14 ] X_test <- df[ - train_index, - 14 ] y_test <- df[ - train_index, 14 ] # Linear svm_linear <- svm (y_train ~ ., data = X_train, kernel = "linear" ) predictions_linear <- predict (svm_linear, X_test) accuracy_linear <- sum (predictions_linear == y_test) / length (y_test) pred_num_linear <- as.numeric ( as.character (predictions_linear)) y_num_test <- as.numeric ( as.character (y_test)) roc_linear <- roc (y_num_test, pred_num_linear) ## Setting levels: control = 0, case = 1 ## Setting direction: controls < cases auc_linear <- auc (roc_linear) # Polynomial svm_poly <- svm (y_train ~ ., data = X_train, kernel = "polynomial" ) predictions_poly <- predict (svm_poly, X_test) accuracy_poly <- sum (predictions_poly == y_test) / length (y_test) pred_num_poly <- as.numeric ( as.character (predictions_poly)) roc_poly <- roc (y_num_test, pred_num_poly) ## Setting levels: control = 0, case = 1 ## Setting direction: controls < cases auc_poly <- auc (roc_poly) # Accuracy for both classifiers print ( paste ( "Accuracy (Linear Kernel):" , accuracy_linear)) ## [1] "Accuracy (Linear Kernel): 0.810810810810811" print ( paste ( "Accuracy (Polynomial Kernel):" , accuracy_poly)) ## [1] "Accuracy (Polynomial Kernel): 0.702702702702703" 4

# AUC for both classifiers print ( paste ( "AUC (Linear Kernel):" , auc_linear)) ## [1] "AUC (Linear Kernel): 0.811764705882353" print ( paste ( "AUC (Polynomial Kernel):" , auc_poly)) ## [1] "AUC (Polynomial Kernel): 0.676470588235294" # Plot ROC curves plot (roc_linear, col = "blue" , main = "ROC Curves" ) lines (roc_poly, col = "green" ) legend ( "bottomright" , legend = c ( "Linear Kernel" , "Polynomial Kernel" ), fill = c ( "blue" , "green" )) ROC Curves Specificity Sensitivity 1.0 0.5 0.0 0.0 0.2 0.4 0.6 0.8 1.0 Linear Kernel Polynomial Kernel 5

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Recommended textbooks for you

Algebra & Trigonometry with Analytic Geometry

Algebra

ISBN:9781133382119

Author:Swokowski

Publisher:Cengage

Functions and Change: A Modeling Approach to Coll...

Algebra

ISBN:9781337111348

Author:Bruce Crauder, Benny Evans, Alan Noell

Publisher:Cengage Learning

Algebra and Trigonometry (MindTap Course List)

Algebra

ISBN:9781305071742

Author:James Stewart, Lothar Redlin, Saleem Watson

Publisher:Cengage Learning

College Algebra

Algebra

ISBN:9781938168383

Author:Jay Abramson

Publisher:OpenStax

SEE MORE TEXTBOOKS

Recommended textbooks for you

Algebra & Trigonometry with Analytic Geometry
Algebra
ISBN:9781133382119
Author:Swokowski
Publisher:Cengage
Functions and Change: A Modeling Approach to Coll...
Algebra
ISBN:9781337111348
Author:Bruce Crauder, Benny Evans, Alan Noell
Publisher:Cengage Learning
Algebra and Trigonometry (MindTap Course List)
Algebra
ISBN:9781305071742
Author:James Stewart, Lothar Redlin, Saleem Watson
Publisher:Cengage Learning
College Algebra
Algebra
ISBN:9781938168383
Author:Jay Abramson
Publisher:OpenStax