Final Project
Step 1
Data set: College.csv-
Statistics for a large number of US Colleges from the 1995 issue of US News and
World Report. This dataset was taken from the StatLib library which is
maintained at Carnegie Mellon University. It was used in the ASA Statistical
Graphics Section’s 1995 Data Analysis Exposition.
Project goal: based on
the college set data giving from ISLR package, I want to be able to determine students'
graduation rates based on several factors (variables).
Step 2
Hypothesis-
The
fraction of students from the top 10% of
the class predict what fraction graduates better than top 25% of high school
graduate student entering college.
Null Hypothesis-
The fraction of
students from the top 10% of the class don’t predict what fraction graduates
better than top 25% of high school graduate student entering college.
Step 3 R Codes
I'm going be using Public school only variables from college data sets: Top10perc
and Top25perc of high school student graduate as my input variables and for my
output variables I would be using Grad. Rate to determine if in facet my input variables
show a relation.
III. the function I use is lm() which is used to fit
linear models. It can be used to carry out regression, single stratum analysis
of variance and analysis of covariance. Also use aov() which Fit an
analysis of variance model by a call to lm for each stratum.
using lm() function
Anova function to display Statistics results
Step 4.
After analyzing my hypothesis I found that The fraction of
students from the top 10% of the class
predict what fraction graduates with P value that’s closed to the significant level
of 0.05. the p value I got for for my top 10 percent is 0.0115 which mean A
small p-value (typically ≤ 0.05) indicates strong evidence against the null
hypothesis, so I reject it. whereas in my Top 25 percent my p value is 0.1656
which indicates A large p-value (> 0.05) indicates weak evidence against the
null hypothesis, so you fail to reject it.
BOXPLOT:
Comments
Post a Comment