MATH 533(Applied Managerial Statistics) Project AJ Davis Department Stores; Part C: Regression and Correlation Analysis Using MINITAB perform the regression and correlation analysis for the data on CREDIT BALANCE (Y) and SIZE (X) by answering the following. 1. Generate a scatterplot for CREDIT BALANCE vs. SIZE, including the graph of the "best fit" line. Interpret. Scatterplot of Credit Balance($) vs Size 6000 5000 Credit Balance($) 4000 3000 2000 1 2 3 4 Size 5 6 7 The scatter plot of Credit balance ($) versus Size show that the slope of the „best fit‟ line is upward (positive);this indicates that Credit balance varies directly with Size. As Size increases, Credit Balance also increases vice versa. Correct …show more content…
Correct MINITAB OUTPUT: Predicted Values for New Observations New ObsFit 1 4607.5 SE Fit 95% CI 119.0 (4368.2, 4846.9) 95% PI (3337.9, 5877.2) Values of Predictors for New Observations New Obs 1 Size 5.00 9. Using an interval, predict the credit balance for a customer that has a household size of 5. Interpret this interval. The credit balance for a customer that has household size of 5 is expected to lie within the interval of (3337.9, 5877.2). This is the 95% prediction interval estimate for the credit balance for a customer that has household size of 5. Correct MINITAB OUTPUT: Predicted Values for New Observations New ObsFit 1 4607.5 SE Fit 95% CI 119.0 (4368.2, 4846.9) 95% PI (3337.9, 5877.2) Values of Predictors for New Observations New Obs 1 Size 5.00 10. What can we say about the credit balance for a customer that has a household size of 10? Explain your answer. We cannot say anything about the credit balance for a customer that has a household size of 10 because since the maximum value of the predictor variable (size) used to formulate the given regression model is only 7, which is much less than 10; therefore, we cannot use the given regression model to accurately estimate the credit balance for a customer that has a household size of 10. Correct In an attempt to improve the model, we attempt to do a multiple regression model predicting CREDIT BALANCE based on INCOME, SIZE and YEARS. 11. Using MINITAB run the
What is the lifetime value of a typical customer in each of the four segments, in current dollar values? Compare these figures to the “Gross margin” figures in the original spreadsheet. What can you learn from this comparison?
To compute the 90% prediction interval for all trading days during the study period, the formula ( , ) can be used. Referring to the question equals 0.1 and equals 0.05.
5) Graph the equation you wrote in step four superimposed over the original data. Comment on how well or how poorly the equation fits the data.
Iterations of analysis eliminated data points that were listed as “unusual observations,” or any data point with a large standardized residual. After 5 iterations, the analysis showed improved residual plots. Randomness in the versus fits and versus order plots means that the linear regression model is appropriate for the data; a straight line in the normal probability plot illustrates the linearity of the data, and a bell shaped curve in the histogram illustrates the normality of the data.
Problem 2.6: In fitting a model to classify prospects as purchasers or non-purchasers, a certain company drew the training data from internal data that include demographic and purchase information. Future data to be classified will be lists purchased from other sources, with demographic (but not purchase) data included. It was found that “refund issued” was a useful
AJ DAVIS is a department store chain, which has many credit customers. A sample of 50 credit customers is selected with data collected on location, income, credit balance, number of people and years lived in the house
6. Why is the black line so much more variable than the red line? What 's the difference between the data they show?
You put I_taxe into the regression equation, but I do not see a parameter estimate in Table 6.
This analysis would benefit from the addition of more variables. The addition of variables would allow a more accurate study. These variables should also being broken down into several data elements. Some suggestions for variables for future analysis include the discount for the bill purchased and a variety of account demographics. These demographics might include credit status, geographic location, bankruptcy/foreclosure, net worth or net income. This study should also be for a longer time period. The additional variables would identify the greatest possible accounts with the shortest time for collection.
The data in the above scatter plot shows that there is a correlation between the quiz results and the exam results. R² = 0.536, which indicates that about 54% of the variation in the average of the quiz is accounted for the linear relationship with the exam results. In other words, about 46% of the variation is not explained by the least-squares regression line.
Lending evaluations by Santander are based on credit background of the person (or company) who wants borrow. Through the development of a credit-approval system, Santander Consumer Finance increased an understanding into these clients on an online database which allowed for the use of real time analysis in determining interest rates for the business interactions.
What is seen in the stem and leaf plot for the money variable (include the shape)? Explain your answer.
The results of the two test statistics differed at times, i.e., listing two different curves as providing the “best” fit. In the fore-mentioned situation, a final decision pertaining to the “best” fit was made based on a visual assessment of the figures.
The recommended decision tree model includes 2 variables : annual income and loans, both of them are interval variables and represent the original observations. They were chosen for the final model, because after several trials, they proved to be the key ones in determining the rules within decision trees.
It is a process of analyzing the relationship among the data from various perspectives and summarizing it into valuable information. It also assists the banks to look for hidden patterns in a group and discovers unknown relationships in the data. These data mining techniques facilitate useful data interpretations for the banking sector to avoid customer attrition. An accurate prediction on the credit approval is important to prospective homeowners, developers, investors, appraisers, tax assessors and other real estate market participants without fraudulence. People who are looking to buy a new place or thing, tend to be more conservative with their budget and acquiring loans from financial institutions. The credit functionality is prime for any banking system over the tentative market conditions. The lack of general credit review system & precise methods in banks are the important reasons, why an expert support system is necessary.