{"id":714,"date":"2015-09-02T20:03:33","date_gmt":"2015-09-03T00:03:33","guid":{"rendered":"http:\/\/www.clayford.net\/statistics\/?p=714"},"modified":"2020-09-13T10:17:58","modified_gmt":"2020-09-13T14:17:58","slug":"creating-an-r-data-package","status":"publish","type":"post","link":"https:\/\/www.clayford.net\/statistics\/creating-an-r-data-package\/","title":{"rendered":"Creating an R data package"},"content":{"rendered":"<p>A few weeks ago I got back into an old textbook of mine, <em><a href=\"http:\/\/www.wright.edu\/~dvoss\/book\/DeanVoss.html\" target=\"_blank\" rel=\"noopener noreferrer\">Design and Analysis of Experiments<\/a><\/em> (Dean and Voss). One of the things I love about this book is the incredible number of actual experiments used to demonstrate concepts. And all the data are available from the <a href=\"http:\/\/www.wright.edu\/~dan.voss\/bookdata\/data.html\" target=\"_blank\" rel=\"noopener noreferrer\">author&#8217;s website<\/a>. My main interest in working through the book was to convert the many examples provided in SAS code into corresponding R code and to recreate the many plots. To do that meant having to download the data from the website and read it into R. Not that big of a deal, really, but I started thinking how nice it would be if the data were in an R package. Then I could just load the package and use the data at will. And I could document the data so I wouldn&#8217;t have to refer back to the book for variable definitions. And that&#8217;s what put me on the road to creating my first R data package, <a href=\"https:\/\/github.com\/clayford\/dvdata\" target=\"_blank\" rel=\"noopener noreferrer\"><code>dvdata<\/code><\/a>.<\/p>\n<p>Now you&#8217;ll notice that last link went to GitHub, instead of CRAN. That&#8217;s because I asked one of the authors after I built the package if he would mind me uploading it to CRAN. Unfortunately, he did mind, because it turns out he&#8217;s working on his own R package for the next edition of the book. I was a little bummed, because I really wanted it on CRAN for the warm feeling of authenticity. But I understand. And besides the package still does what I wanted all along.  <\/p>\n<p>Now let&#8217;s talk about creating an R package. The very first thing you want to do is head over to Hadley Wickham&#8217;s <a href=\"http:\/\/r-pkgs.had.co.nz\/\" target=\"_blank\" rel=\"noopener noreferrer\">R Packages<\/a> site. He wrote a lovely book on how to create R Packages and posted it for free. And because it&#8217;s online, it almost always up-to-date. Hadley gently walks you through the process of creating an R package using his <code>devtools<\/code> package. I found it very easy to follow and I can&#8217;t recommend it enough.<\/p>\n<p>What I want to do in this post is document in one place the basic steps to creating a R data package. All of these steps are in Hadley&#8217;s book, but they&#8217;re a little spread out due to the structure of the book, and because he covers a lot more than just making a simple data package.<\/p>\n<p>Before you start, follow the directions under <a href=\"http:\/\/r-pkgs.had.co.nz\/intro.html#intro-get\" target=\"_blank\" rel=\"noopener noreferrer\">Getting Started<\/a> in the Intro to Hadley&#8217;s book.<\/p>\n<p><strong>Steps to making an R data package<\/strong><\/p>\n<p>1. come up with a name for your package and create a package in RStudio as described <a href=\"http:\/\/r-pkgs.had.co.nz\/package.html#getting-started\" target=\"_blank\" rel=\"noopener noreferrer\">here<\/a>. This creates the smallest usable package. Let&#8217;s say you named your package &#8220;grrr&#8221;. On your computer you now have a directory called &#8220;grrr&#8221; which contains your package folders and files.<\/p>\n<p>2. create two new directories: &#8220;data&#8221; and &#8220;data-raw&#8221; in your package directory. <\/p>\n<p>3. go to the &#8220;R&#8221; directory in your package and delete the &#8220;hello.R&#8221; file.<\/p>\n<p>4. Start a new R script called, perhaps, &#8220;get_data.R&#8221; and save to the &#8220;raw-data&#8221; directory. This will be the R script that reads in your data, wrangles it into shape and saves the data as an .RData file. You need to save the .RData objects into the &#8220;data&#8221; directory. The .RData objects are the data frames (or lists, or matrices, or vectors) your package will allow you to easily load. For examples, see Hadley&#8217;s R scripts in <a href=\"https:\/\/github.com\/hadley\/babynames\/tree\/master\/data-raw\" target=\"_blank\" rel=\"noopener noreferrer\">the &#8220;data-raw&#8221; directory of his <code>babynames<\/code> package<\/a>. <\/p>\n<p>5. When your data objects are done (i.e., the .RData files in your &#8220;data&#8221; directory) start an R script called &#8220;data.R&#8221; in the &#8220;R&#8221; directory. This is where you will compose the documentation for your data objects. Follow the directions <a href=\"http:\/\/r-pkgs.had.co.nz\/data.html#documenting-data\" target=\"_blank\" rel=\"noopener noreferrer\">in this section of Hadley&#8217;s book<\/a>.<\/p>\n<p>6. As you write your documentation, follow the <a href=\"http:\/\/r-pkgs.had.co.nz\/man.html#man-workflow\" target=\"_blank\" rel=\"noopener noreferrer\">The Documentation Workflow Hadley outlines in this section<\/a>. Basically this involves submitting <code>devtools::document()<\/code> from the console and then previewing the documentation. Each time you submit <code>devtools::document()<\/code> Rd files are generated in the &#8220;man&#8221; directory of your package. (If you didn&#8217;t have a &#8220;man&#8221; directory, <code>devtools<\/code> creates one for you.) Do this until you are satisfied with your documentation.<\/p>\n<p>7. Update the DESCRIPTION file <a href=\"http:\/\/r-pkgs.had.co.nz\/description.html\" target=\"_blank\" rel=\"noopener noreferrer\">as explained in this section<\/a>. DESCRIPTION is just a text file.<\/p>\n<p>8. Add <code>^data-raw$<\/code> to .Rbuildignore file. It too is just a text file. That keeps the &#8220;data-raw&#8221; folder from being included when the package is built.<\/p>\n<p>9. Build the package: Ctrl + Shift + B. Feel free to do this at any point as you&#8217;re working on your package.<\/p>\n<p>That about does it! If you want to submit to CRAN, then <a href=\"http:\/\/r-pkgs.had.co.nz\/release.html\" target=\"_blank\" rel=\"noopener noreferrer\">read Hadley&#8217;s Release chapter<\/a> very closely and follow it to the T. <\/p>\n<p>After creating <code>dvdata<\/code>, I created another data package called <code><a href=\"https:\/\/cran.r-project.org\/web\/packages\/valottery\/index.html\" target=\"_blank\" rel=\"noopener noreferrer\">valottery<\/a><\/code> that contains historical results of Virginia lottery drawings. This one I <em>did<\/em> get uploaded to CRAN.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>A few weeks ago I got back into an old textbook of mine, Design and Analysis of Experiments (Dean and&#8230; <a class=\"read-more\" href=\"https:\/\/www.clayford.net\/statistics\/creating-an-r-data-package\/\">Read more<\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[13],"tags":[59,58,60],"class_list":["post-714","post","type-post","status-publish","format-standard","hentry","category-using-r","tag-dvdata","tag-r-packages","tag-valottery"],"_links":{"self":[{"href":"https:\/\/www.clayford.net\/statistics\/wp-json\/wp\/v2\/posts\/714","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.clayford.net\/statistics\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.clayford.net\/statistics\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.clayford.net\/statistics\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.clayford.net\/statistics\/wp-json\/wp\/v2\/comments?post=714"}],"version-history":[{"count":5,"href":"https:\/\/www.clayford.net\/statistics\/wp-json\/wp\/v2\/posts\/714\/revisions"}],"predecessor-version":[{"id":803,"href":"https:\/\/www.clayford.net\/statistics\/wp-json\/wp\/v2\/posts\/714\/revisions\/803"}],"wp:attachment":[{"href":"https:\/\/www.clayford.net\/statistics\/wp-json\/wp\/v2\/media?parent=714"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.clayford.net\/statistics\/wp-json\/wp\/v2\/categories?post=714"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.clayford.net\/statistics\/wp-json\/wp\/v2\/tags?post=714"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}