Predicting Housing Median Prices. – The file BostonHousing.csv contains information on 506 census tracts in Boston, where for each tract multiple variables are recorded. The last column (CAT.MEDV) was derived from MEDV, such that it obtains the value 1 if MEDV > 30 and 0 otherwise. First, consider the goal of predicting the median value (MEDV) of a tract, given the information in the first 12 columns. Second, consider the goal of classifying the property using the last column of CAT.MEDV. Partition the data into training (60%) and validation (40%) sets. a1. Perform a knn prediction with all 12 predictors (columns 1 – 12) with MEDV (column 13) as the outcome variable. (Ignore the CAT.MEDV column in this step.) Try values of k from 1 to 10. Make sure to normalize the data (preprocess), and choose function knn() from the class package/library rather than FNN. [To make sure R is using class package (when both packages are loaded), use class::knn().] What is the best k? What does it mean?

Operations Research : Applications and Algorithms
4th Edition
ISBN:9780534380588
Author:Wayne L. Winston
Publisher:Wayne L. Winston
Chapter24: Forecasting Models
Section24.8: Multiple Regression
Problem 9P
icon
Related questions
Question
100%
Predicting Housing Median Prices. – The file BostonHousing.csv contains information on 506 census tracts in Boston, where for each tract multiple variables are recorded. The last column (CAT.MEDV) was derived from MEDV, such that it obtains the value 1 if MEDV > 30 and 0 otherwise. First, consider the goal of predicting the median value (MEDV) of a tract, given the information in the first 12 columns. Second, consider the goal of classifying the property using the last column of CAT.MEDV.
Partition the data into training (60%) and validation (40%) sets.
a1. Perform a knn prediction with all 12 predictors (columns 1 – 12) with MEDV (column 13) as the outcome variable. (Ignore the CAT.MEDV column in this step.) Try values of k from 1 to 10. Make sure to normalize the data (preprocess), and choose function knn() from the class package/library rather than FNN. [To make sure R is using class package (when both packages are loaded), use class::knn().] What is the best k? What does it mean?
a2. Perform a knn classification with all 12 predictors, trying various values of k from 1 to 10.
 
b. Predict the MEDV for a tract with the following information, using the best k:
CRIM = 0.2
ZN = 0
INDUS = 7
CHAS = 0
NOX = 0.538
RM = 6
AGE = 62
DIS = 4.7
RAD = 4
TAX = 307
PTRATIO = 21
LSTAT = 10
 
c. If we used the above k-NN algorithm to score the training data, what would be the error of the training set?
Expert Solution
trending now

Trending now

This is a popular solution!

steps

Step by step

Solved in 4 steps

Blurred answer
Knowledge Booster
Decision Tree
Learn more about
Need a deep-dive on the concept behind this application? Look no further. Learn more about this topic, computer-science and related others by exploring similar questions and additional content below.
Similar questions
  • SEE MORE QUESTIONS
Recommended textbooks for you
Operations Research : Applications and Algorithms
Operations Research : Applications and Algorithms
Computer Science
ISBN:
9780534380588
Author:
Wayne L. Winston
Publisher:
Brooks Cole