This is my modest space on the internet where I write about statistics and using R. I assume anyone reading this site is probably looking for help, hence the tagline “help with learning statistics.” I find that teaching statistical concepts helps me better understand the concepts myself. So this blog is mostly about me trying to solidify and expand my statistical knowledge. But if it helps others, so much the better!

Thanks,

Clay Ford

Matan GilbertHello Mr. Ford,

I found your blog in a search for resources addressing the generation of (fake) data for analysis. Are you aware of any texts that treat this subject from an introductory level, including understanding and choosing appropriate probability distributions, etc.)?

Clay FordPost authorYou may want to check out the book An Introduction to Statistical Computing: A Simulation-based Approach by Voss. I have never read it and it looks expensive, but it appears it might be a resource for generating fake data. See chapter 2, Simulating Statistical Models.

KyleHi Clay,

I found you after having read several articles you wrote on analyzing Categorical data using R posted on the UVa Library Services website. On two occassions now I was at a loss trying to understand what my professor (or Agresti) was talking about (or why it mattered). Then I read your articles working through basic analyses and afterwards engaging with the finer details in my notes and the textbook made a lot more sense. Thanks for taking the time to write so clearly. I appreciate it.

Regards,

Kyle

Clay FordPost authorThank you for the kind words, Kyle. Glad I could help.

StephHi Clay, I just found your tutorial on quantile regression in R (https://data.library.virginia.edu/getting-started-with-quantile-regression/) and wanted to thank you. It has been so useful! I’m excited that it led me to your blog, here, to learn more about some techniques that I’ve not yet used in R. Many thanks for such a clear and helpful overview — I really enjoy how you step through concepts logically and completely!

Clay FordPost authorThank you! It makes me happy that you found it useful.

ssHi Clay, I just saw your r codes on github about effects plot: https://github.com/clayford/effects_pkg. Both the rscript and the rmarkdown files gives error when running the codes. The rscript as far gives error because of the data set babies is not available and the markdown gives several erros. Do you have the opportunity to update them? I would really appreciate it!

Clay FordPost authorHi. I just saw this comment. I have updated the GitHub repo so the Rmd file compiles and have added the babies data set. The effects package has been updated quite a bit since 2016 but the code still seems to work ok.

Thanks,

Clay

Kyle GrealisHi, Clay! I see that’s it’s been a while since a reply was posted here, so I’m hoping that you’ll still see this. I was working with the {pwr} package in R, was reading through the vignette, and I’m hoping it’s you that wrote it. If possible, I do have a couple questions… at your convenience!

Thank you!

Kyle

Clay FordPost authorYeah, that was me. Happy to try and answer any questions you might have.

jpSo I was reading youre article on the problems with r2. My question is can a r2 value be too high with linear data or is that only a probem with none linear data?

Clay FordPost authorI’m not sure I follow your question. I’m also not sure what article you’re talking about.

jpI was referring to the article ‘Is R-squared Useless?” On the University of Virginia blog it attributes authorship to you. In it the arguments issued by Cosma Shalizi against the r2 metric were reviewed by you. In particular, I was asking you about the demonstration of how r2 can be arbitrarily high even with an incorrect model. You used non-linear data to prove the point. But my question is if the same issue can result using linear data? Secondly, I wanted to know in your opinion how well can adjusted r2 values compensate for these alleged failings.

Link to article in question

https://library.virginia.edu/data/articles/is-r-squared-useless

Clay FordPost authorI forgot about that article. That was almost 9 years ago. Where did the time go?

Sure, I imagine the issue could arise with linear data. Fit a highly flexible non-linear model to linear data and you can get high r squared values despite the model being wrong. That’s an example of overfitting the data. Adjusted R squared in this case wouldn’t fix the fact you fit the wrong model.