Final Project

Step 1

Data set: College.csv- Statistics for a large number of US Colleges from the 1995 issue of US News and World Report. This dataset was taken from the StatLib library which is maintained at Carnegie Mellon University. It was used in the ASA Statistical Graphics Section’s 1995 Data Analysis Exposition.

Project goal: based on the college set data giving from ISLR package, I want to be able to determine students' graduation rates based on several factors (variables).

Step 2

Hypothesis-

The fraction of students from the top 10% of the class predict what fraction graduates better than top 25% of high school graduate student entering college.

Null Hypothesis-

The fraction of students from the top 10% of the class don’t predict what fraction graduates better than top 25% of high school graduate student entering college.

Step 3 R Codes

I'm going be using Public school only variables from college data sets: Top10perc and Top25perc of high school student graduate as my input variables and for my output variables I would be using Grad. Rate to determine if in facet my input variables show a relation.

III. the function I use is lm() which is used to fit linear models. It can be used to carry out regression, single stratum analysis of variance and analysis of covariance. Also use aov() which Fit an analysis of variance model by a call to lm for each stratum.

using lm() function

Anova function to display Statistics results

Step 4.

After analyzing my hypothesis I found that The fraction of students from the top 10% of the class predict what fraction graduates with P value that’s closed to the significant level of 0.05. the p value I got for for my top 10 percent is 0.0115 which mean A small p-value (typically ≤ 0.05) indicates strong evidence against the null hypothesis, so I reject it. whereas in my Top 25 percent my p value is 0.1656 which indicates A large p-value (> 0.05) indicates weak evidence against the null hypothesis, so you fail to reject it.

Graphs and boxplot: using ggplot smooth function and boxplot, I generate the difference between top 10 percent and top 25 percent of student fro high school that can predict graduation rate.