MASS Introductory Session
========================================================
This is an R Markdown document. Markdown is a simple formatting syntax for authoring web pages (click the **MD** toolbar button in R Studio for help on Markdown).
When you click the **Knit HTML** button a web page will be generated that includes both content as well as the output of any embedded R code chunks within the document.
```{r}
# A command to make MASS datasets available.
library(MASS)
# generate 1000 pairs of normal variates
x <- rnorm(1000)
y <- rnorm(1000)
# Histogram of a mixture of normal distributions.
truehist(c(x,y+3),nbins=25)
#2D density plot
contour(dd <- kde2d(x,y))
#Pseudo-color plot
image(dd)
```
*****
```{r}
# Make x = (1,1.5,2,...,19.5,20) and list it
x <- seq(1,20,0.5)
x
# w will be used as a "weight" vector and to give the standard deviations of the errors.
w <- 1 + x/2
y <- x + w*rnorm(x)
# Make a data frame of the three columns named x, y, and w and look at it. Remove the original x, y and w.
dum <- data.frame(x,y,w)
dum
rm(x, y, w)
#fit a simple linear regression of y on a and look at the analysis
fm <- lm(y ~ x, data = dum)
summary(fm)
# Since we know the standard deviations, we can do a weighted regression.
fm1 <- lm(y ~ x, data = dum,
weight = 1/w^2)
summary(fm1)
# Fit a smooth regression curve using a modern regression function.
lrf <- loess(y ~ x, dum)
#Make the columns in the data frame visible as variables.
attach(dum)
# Make a standard scatterplot. To this plot we will add the three regression lines (or curves) as well as the known true line.
plot(x, y)
# First add in the local regression curve using a spline interpolation between the calculated points.
lines(spline(x, fitted(lrf)),col = 2)
# Add in the true regression line (intercept 0, slope 1) with a different line type and color.
abline(0, 1, lty = 3, col = 3)
# Add in the unweighted regression line. abline() is able to extract the information it needs from the fitted regression object.
abline(fm, col = 4)
# Finally add in the weighted regression line, in line type 4. This one should be the most accurate estimate, but may not be, of course.
abline(fm1, lty = 4, col = 5)
# A standard regression diagnostic plot to check for heteroscedasticity, that is, for unequal variances. The data are generated from a heteroscedastic process, so can you see this from this plot?
plot(fitted(fm), resid(fm), xlab = "Fitted Values", ylab = "Residuals")
# A normal scores plot to check for skewness, kurtosis and outliers. (Note that the heteroscedasticity may show as apparentnon-normality.)
qqnorm(resid(fm))
qqline(resid(fm))
```