2. With the growth of internet service providers, a researcher decides to examine whether there is a correlation between cost of internet service per month (rounded to the nearest dollar) and degree of customer satisfaction (on a scale of 1-10 with a 1 being not at all satisfied and a 10 being extremely satisfied). The researcher only includes programs with comparable types of services. A sample of the data is provided below. dollars satisfaction 11 6 18 8 17 10 15 4 9 9 5 6 12 3 19 5 2 10 22 25 Compute the correlation coefficient. Is this correlation statistically significant at 90%, 99%, 99.9% significance levels? Implement in Python

Database System Concepts
7th Edition
ISBN:9780078022159
Author:Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan
Publisher:Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan
Chapter1: Introduction
Section: Chapter Questions
Problem 1PE
icon
Related questions
Question

please help with python program I am providng program that some part u can reuse for question nubmer 2 is attched in image 

 

 

# call the appropriate libraries
import numpy as np
from numpy import mean
from numpy import std
from numpy.random import randn
from numpy.random import seed
from matplotlib import pyplot
from numpy import cov
from scipy.stats import pearsonr
from scipy.stats import spearmanr
import scipy.stats as stats
import pandas as pd

# generate data
# seed random number generator
seed(1)
# prepare data
data1 = 20 * randn(1000) + 100
data2 = data1 + (10 * randn(1000) + 50)
# print mean and std
print('data1: mean=%.3f stdv=%.3f' % (mean(data1), std(data1)))
print('data2: mean=%.3f stdv=%.3f' % (mean(data2), std(data2)))
# plot the sample data
pyplot.scatter(data1, data2)
pyplot.show()

# calculate covariance
covariance = cov(data1, data2)
print('The covariance matrix')
print(covariance)

# calculate Pearson's correlation
corr, _ = pearsonr(data1, data2)
print('Pearsons correlation: %.3f' % corr)
##calculate spearman correlation
corr, _ = spearmanr(data1, data2)
print('Spearmans correlation: %.3f' % corr)
######################chi square test
####Input as contigency table
####consider different pets bought by male and female
#######dog   cat  bird  total
##men   207  282  241   730
##women 234  242  232   708
##total 441   524  473  1438

###The aim of the test is to conclude whether the two variables( gender and choice of pet ) are related to each other.


from scipy.stats import chi2_contingency
print("CHI SQUARE TEST with HYPOTHESIS TESTING")
# defining the table
data = [[207, 282, 241], [234, 242, 232]]
stat, p, dof, expected = chi2_contingency(data)
######hyposthesis testing
# interpret p-value
alpha = 0.05
print("p value is " + str(p))
if p <= alpha:
    print('reject H0 - have correlation with 95% confidence level')
else:
    print('accept H0 - Independent no correlation with 95% confidence level')


np.random.seed(6)
####generate possion distribution with lowest x value 18 and mean given by mu of the distribution and size of sample
population_ages1 = stats.poisson.rvs(loc=18, mu=35, size=150000)
population_ages2 = stats.poisson.rvs(loc=18, mu=10, size=100000)
population_ages = np.concatenate((population_ages1, population_ages2))

minnesota_ages1 = stats.poisson.rvs(loc=18, mu=30, size=30)
minnesota_ages2 = stats.poisson.rvs(loc=18, mu=10, size=20)
minnesota_ages = np.concatenate((minnesota_ages1, minnesota_ages2))

print( population_ages.mean(), ' is the population mean' )
print( minnesota_ages.mean(), ' is the sample mean' )
### we know that both samples comes from different distribution
#####Let's conduct a t-test at a 95% confidence level and see if it correctly rejects the null hypothesis
# that the sample comes from the same distribution as the population.
s, p = stats.ttest_1samp(a = minnesota_ages,               # Sample data
                 popmean = population_ages.mean())
print(s, 'is the test statistics')

### interpret the t-statistics
if s >= 0:
   print('Sample mean is larger than the population mean')
else:
   print('Sample mean is smaller than the population mean')

###if p value is less than 0.05 we reject the null hypothesis that both samples are same
if p < 0.05:
    print('This observation is statistically significant with 95% confidence.')
else:
    print('This observation is not statistically significant with 95% confidence.')



###performing min max normalization

data = {'weight':[300, 250, 800],
        'price':[3, 2, 5]}
df = pd.DataFrame(data)

print(df)
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
normalized_data = scaler.fit_transform(df)
print('Normalized data')
print(normalized_data)

###standardization is process of converting data to z score value and spread is across median 0

from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
standardized_data = scaler.fit_transform(df)
print('standardized data value')
print(standardized_data)

####normality check using Q-Q plot

np.random.seed(0)
data = np.random.normal(0,1, 1000)

import statsmodels.api as sm
import matplotlib.pyplot as plt

#create Q-Q plot with 45-degree line added to plot
fig = sm.qqplot(data, line='45')
plt.show()
2. With the growth of internet service providers, a researcher decides to examine whether there is a correlation between cost of internet service per month
(rounded to the nearest dollar) and degree of customer satisfaction (on a scale of 1-10 with a 1 being not at all satisfied and a 10 being extremely satisfied).
The researcher only includes programs with comparable types of services. A sample of the data is provided below.
dollars satisfaction
11
18
17
15
9
5
12
19
22
25
6
8
10
4
9
6
3
5
2
10
Compute the correlation coefficient. Is this correlation statistically significant at 90%, 99%, 99.9% significance levels? Implement in Python
Transcribed Image Text:2. With the growth of internet service providers, a researcher decides to examine whether there is a correlation between cost of internet service per month (rounded to the nearest dollar) and degree of customer satisfaction (on a scale of 1-10 with a 1 being not at all satisfied and a 10 being extremely satisfied). The researcher only includes programs with comparable types of services. A sample of the data is provided below. dollars satisfaction 11 18 17 15 9 5 12 19 22 25 6 8 10 4 9 6 3 5 2 10 Compute the correlation coefficient. Is this correlation statistically significant at 90%, 99%, 99.9% significance levels? Implement in Python
Expert Solution
steps

Step by step

Solved in 4 steps with 1 images

Blurred answer
Knowledge Booster
Analysis of Performance Measurement
Learn more about
Need a deep-dive on the concept behind this application? Look no further. Learn more about this topic, computer-science and related others by exploring similar questions and additional content below.
Similar questions
  • SEE MORE QUESTIONS
Recommended textbooks for you
Database System Concepts
Database System Concepts
Computer Science
ISBN:
9780078022159
Author:
Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan
Publisher:
McGraw-Hill Education
Starting Out with Python (4th Edition)
Starting Out with Python (4th Edition)
Computer Science
ISBN:
9780134444321
Author:
Tony Gaddis
Publisher:
PEARSON
Digital Fundamentals (11th Edition)
Digital Fundamentals (11th Edition)
Computer Science
ISBN:
9780132737968
Author:
Thomas L. Floyd
Publisher:
PEARSON
C How to Program (8th Edition)
C How to Program (8th Edition)
Computer Science
ISBN:
9780133976892
Author:
Paul J. Deitel, Harvey Deitel
Publisher:
PEARSON
Database Systems: Design, Implementation, & Manag…
Database Systems: Design, Implementation, & Manag…
Computer Science
ISBN:
9781337627900
Author:
Carlos Coronel, Steven Morris
Publisher:
Cengage Learning
Programmable Logic Controllers
Programmable Logic Controllers
Computer Science
ISBN:
9780073373843
Author:
Frank D. Petruzella
Publisher:
McGraw-Hill Education