About

This is my modest space on the internet where I write about statistics and using R. I assume anyone reading this site is probably looking for help, hence the tagline “help with learning statistics.” I find that teaching statistical concepts helps me better understand the concepts myself. So this blog is mostly about me trying to solidify and expand my statistical knowledge. But if it helps others, so much the better!

Thanks,
Clay Ford

14 comments

Matan Gilbert says:

July 13, 2018 at 2:27 pm

Hello Mr. Ford,

I found your blog in a search for resources addressing the generation of (fake) data for analysis. Are you aware of any texts that treat this subject from an introductory level, including understanding and choosing appropriate probability distributions, etc.)?

Reply
Clay Ford says:

July 13, 2018 at 2:41 pm

You may want to check out the book An Introduction to Statistical Computing: A Simulation-based Approach by Voss. I have never read it and it looks expensive, but it appears it might be a resource for generating fake data. See chapter 2, Simulating Statistical Models.

Reply
Kyle says:

July 15, 2018 at 3:25 pm

Hi Clay,

I found you after having read several articles you wrote on analyzing Categorical data using R posted on the UVa Library Services website. On two occassions now I was at a loss trying to understand what my professor (or Agresti) was talking about (or why it mattered). Then I read your articles working through basic analyses and afterwards engaging with the finer details in my notes and the textbook made a lot more sense. Thanks for taking the time to write so clearly. I appreciate it.

Regards,

Kyle

Reply
1. Clay Ford says:
  
  July 15, 2018 at 7:48 pm
  
  Thank you for the kind words, Kyle. Glad I could help.
  
  Reply
Steph says:

May 6, 2019 at 12:32 pm

Hi Clay, I just found your tutorial on quantile regression in R (https://data.library.virginia.edu/getting-started-with-quantile-regression/) and wanted to thank you. It has been so useful! I’m excited that it led me to your blog, here, to learn more about some techniques that I’ve not yet used in R. Many thanks for such a clear and helpful overview — I really enjoy how you step through concepts logically and completely!

Reply
1. Clay Ford says:
  
  May 6, 2019 at 12:38 pm
  
  Thank you! It makes me happy that you found it useful.
  
  Reply
ss says:

July 31, 2022 at 3:32 pm

Hi Clay, I just saw your r codes on github about effects plot: https://github.com/clayford/effects_pkg. Both the rscript and the rmarkdown files gives error when running the codes. The rscript as far gives error because of the data set babies is not available and the markdown gives several erros. Do you have the opportunity to update them? I would really appreciate it!

Reply
1. Clay Ford says:
  
  October 25, 2022 at 3:45 pm
  
  Hi. I just saw this comment. I have updated the GitHub repo so the Rmd file compiles and have added the babies data set. The effects package has been updated quite a bit since 2016 but the code still seems to work ok.
  
  Thanks,
  Clay
  
  Reply
Kyle Grealis says:

November 10, 2023 at 1:28 am

Hi, Clay! I see that’s it’s been a while since a reply was posted here, so I’m hoping that you’ll still see this. I was working with the {pwr} package in R, was reading through the vignette, and I’m hoping it’s you that wrote it. If possible, I do have a couple questions… at your convenience!

Thank you!
Kyle

Reply
1. Clay Ford says:
  
  November 11, 2023 at 8:23 am
  
  Yeah, that was me. Happy to try and answer any questions you might have.
  
  Reply
jp says:

May 20, 2024 at 11:13 pm

So I was reading youre article on the problems with r2. My question is can a r2 value be too high with linear data or is that only a probem with none linear data?

Reply
1. Clay Ford says:
  
  May 21, 2024 at 10:40 am
  
  I’m not sure I follow your question. I’m also not sure what article you’re talking about.
  
  Reply
  1. jp says:
    
    May 21, 2024 at 1:22 pm
    
    I was referring to the article ‘Is R-squared Useless?” On the University of Virginia blog it attributes authorship to you. In it the arguments issued by Cosma Shalizi against the r2 metric were reviewed by you. In particular, I was asking you about the demonstration of how r2 can be arbitrarily high even with an incorrect model. You used non-linear data to prove the point. But my question is if the same issue can result using linear data? Secondly, I wanted to know in your opinion how well can adjusted r2 values compensate for these alleged failings.
    
    Link to article in question
    https://library.virginia.edu/data/articles/is-r-squared-useless
    
    Reply
    1. Clay Ford says:
      
      May 22, 2024 at 6:46 am
      
      I forgot about that article. That was almost 9 years ago. Where did the time go?
      
      Sure, I imagine the issue could arise with linear data. Fit a highly flexible non-linear model to linear data and you can get high r squared values despite the model being wrong. That’s an example of overfitting the data. Adjusted R squared in this case wouldn’t fix the fact you fit the wrong model.
      
      Reply

14 comments

Leave a Reply Cancel reply