Skip to main content

Final Project



Final Project


Step 1
Data set: College.csv- Statistics for a large number of US Colleges from the 1995 issue of US News and World Report. This dataset was taken from the StatLib library which is maintained at Carnegie Mellon University. It was used in the ASA Statistical Graphics Section’s 1995 Data Analysis Exposition.

Project goal: based on the college set data giving from ISLR package, I want to be able to determine students' graduation rates based on several factors (variables).

Step 2
 Hypothesis-
 The fraction of students from the top 10%  of the class predict what fraction graduates better than top 25% of high school graduate student entering college.

Null Hypothesis-
 The fraction of students from the top 10% of the class don’t predict what fraction graduates better than top 25% of high school graduate student entering college.

Step 3 R Codes

I'm going be using Public school only variables from college data sets:  Top10perc and Top25perc of high school student graduate as my input variables and for my output variables I would be using Grad. Rate to determine if in facet my input variables show a relation.






















III. the function I use is lm() which is used to fit linear models. It can be used to carry out regression, single stratum analysis of variance and analysis of covariance. Also use aov()  which Fit an analysis of variance model by a call to lm for each stratum.

using lm() function 

Anova function to display Statistics results 











Step 4.
After analyzing my hypothesis I found that The fraction of students from the top 10%  of the class predict what fraction graduates with P value that’s closed to the significant level of 0.05. the p value I got for for my top 10 percent is 0.0115 which mean A small p-value (typically ≤ 0.05) indicates strong evidence against the null hypothesis, so I reject it. whereas in my Top 25 percent my p value is 0.1656 which indicates A large p-value (> 0.05) indicates weak evidence against the null hypothesis, so you fail to reject it.



 Graphs and boxplot: using ggplot smooth function and boxplot, I generate the difference between top 10 percent and top 25 percent of student fro high school that can predict graduation rate. 

























 BOXPLOT:

















Comments

Popular posts from this blog

Information Architecture: High Fidelity Design

 For my Group  Project we had to create a low fidelity and high fidelity website design that focus on education and student as well as parents or those involves in education.     

Time Series

Time Series in R Using the data set Tampa weather to create a time series function.  R CODE: ##create data for the rainfall rain2015 <- c(-3,41,33,6,14.6,28.2,21.4,1.81,15.60,0.52,2.90) rain1995 <- c( 0 ,60, 46,16,21.2, 32.6, 26.9, 3.66, 24.20, 0.93, 5.60) ##storing time series and printint it out rrain2015 <- ts(rain2015, ) rrain1995<- ts(rain1995) rrain1995  rrain2015 ##set up time series for the year of rain fall rain2015.timeseries <- ts(rain2015,start = c(2015,1),frequency = 12) ##print the year for rainfall 2015 print(rain2015.timeseries) ##plot the rain fall for 2015 year plot.ts(rrain2015) plot.ts(rain2015.timeseries) lograin2015 <- log(rain2015) plot.ts(lograin2015) #plot multiple time series combined.rainfall <-  matrix(c(rain1995,rain2015),nrow = 12) rainfall.timeseries <- ts(combined.rainfall,start = c(2015,1),frequency = 12) print(rainfall.timeseries) ...

Confidence Interval Estimation And introduction to Fundamental of hypothesis testing

1. x̄ = 85 and σ = 8, and n = 64, set up a 95% confidence interval estimate of the population mean μ.  Z= 1-(0.05/2) = 1.96 Sample mean= x-bar = 85 Z*s/sqrt(n) = (1.96*8)/sqrt(64) = 1.96 CI= 85 – 1.96= 83.04 CI= 85- 1.96= 86.96 (83.04, 86.96) 2. If  x̄ = 125, σ = 24 and n = 36, set up a 99% confidence interval estimate of the population mean μ.  Z= 1- (0.01/2) = 0.995= 2.57 Z*s/sqrt(n) = 125 - (2.57*8/sqrt(36) = 3.42-125= 121.58 Z*s/sqrt(n) = 125 + (2.57*8/sqrt(36) = 3.42+125= 128.42 3. The manager of a supply store wants to estimate the actual amount of paint contained in 1-gallon cans purchased from a nationally known manufacturer. It is known from the manufacturer's specification sheet that standard deviation of the amount of paint is equal to 0.02 gallon. A Random sample of 50 cans is selected and the sample mean amount of paint per 1 gallon is 0.99 gallon.  3a. Set up a 99% confidence inter...