MASS Introductory Session ======================================================== This is an R Markdown document. Markdown is a simple formatting syntax for authoring web pages (click the **MD** toolbar button in R Studio for help on Markdown). When you click the **Knit HTML** button a web page will be generated that includes both content as well as the output of any embedded R code chunks within the document. ```{r} # A command to make MASS datasets available. library(MASS) # generate 1000 pairs of normal variates x <- rnorm(1000) y <- rnorm(1000) # Histogram of a mixture of normal distributions. truehist(c(x,y+3),nbins=25) #2D density plot contour(dd <- kde2d(x,y)) #Pseudo-color plot image(dd) ``` ***** ```{r} # Make x = (1,1.5,2,...,19.5,20) and list it x <- seq(1,20,0.5) x # w will be used as a "weight" vector and to give the standard deviations of the errors. w <- 1 + x/2 y <- x + w*rnorm(x) # Make a data frame of the three columns named x, y, and w and look at it. Remove the original x, y and w. dum <- data.frame(x,y,w) dum rm(x, y, w) #fit a simple linear regression of y on a and look at the analysis fm <- lm(y ~ x, data = dum) summary(fm) # Since we know the standard deviations, we can do a weighted regression. fm1 <- lm(y ~ x, data = dum, weight = 1/w^2) summary(fm1) # Fit a smooth regression curve using a modern regression function. lrf <- loess(y ~ x, dum) #Make the columns in the data frame visible as variables. attach(dum) # Make a standard scatterplot. To this plot we will add the three regression lines (or curves) as well as the known true line. plot(x, y) # First add in the local regression curve using a spline interpolation between the calculated points. lines(spline(x, fitted(lrf)),col = 2) # Add in the true regression line (intercept 0, slope 1) with a different line type and color. abline(0, 1, lty = 3, col = 3) # Add in the unweighted regression line. abline() is able to extract the information it needs from the fitted regression object. abline(fm, col = 4) # Finally add in the weighted regression line, in line type 4. This one should be the most accurate estimate, but may not be, of course. abline(fm1, lty = 4, col = 5) # A standard regression diagnostic plot to check for heteroscedasticity, that is, for unequal variances. The data are generated from a heteroscedastic process, so can you see this from this plot? plot(fitted(fm), resid(fm), xlab = "Fitted Values", ylab = "Residuals") # A normal scores plot to check for skewness, kurtosis and outliers. (Note that the heteroscedasticity may show as apparentnon-normality.) qqnorm(resid(fm)) qqline(resid(fm)) ```