Data Workflow Using R: Sogang University Workshop, 2023 Fall
(In Korean) A series of slides on a principled crash-course to learning R.
I. Beginner-friendly Learning Materials
- How to install R, RStudio, and R packages: a learnr tutorial.
- Wickham, Hadley, and Garrett Grolemund. R for Data Science. O’Reilly Media, Inc., 2016.
- fasteR: Fast Lane to Learning R! by Norm Matloff.
- You can use swirl to learn base R interactively. There are various courses that can be installed by swirl, including my swirl-tidy lesson that helps you learn tidyverse. For a quick installation guide, see here.
II. What If I Have Coding Questions? What If Something is Not Working?
Before you start Googling or seeking help:
- Take a deep breath and accept that learning to debug is (highly likely) a painful, grueling process that has a steep learning curve. You will likely have to go through many frustrated minutes, hours, or even days! It gets better, but with some heavy investment.
- Create a Stack Overflow account.
- It will help you leave a trail of what worked for you, in terms of upvotes and bookmarks.
- You will learn how to ask a “good question.”
- You will become familiar with the concept of a minimal, reproducible example.
Now that you’ve braced yourself,
- Google the error message. 90%+ of the time, the question has been already asked on Stack Overflow.
- [New!] Ask ChatGPT to provide a documentation and example code. For example,
- [New!] Similarly, RTutor.ai can help translate natural languages into R scripts.
Note AI-generated answers may not be always accurate, especially for complex queries. Use at your own peril.
III. Advanced Steps
-
Comment generously! You will not be able to remember what you were doing without ample commenting, nor will other people be able to understand your code. Otherwise, this will be you reading your old code:
- Never save/restore .RData. In fact, run
usethis::use_blank_slate()
. - Form each data analysis into a project, use
here::here()
as opposed tosetwd
, and be mindful of the project-oriented workflow and reproducibility. - Please use the
styler
package, which makes it very easy to style your code consistently. - Using
assertthat
, insert unit tests/sanity checks about your dataset so that you catch mistakes quickly. For example, you could preemptively detected and warn about duplicates, missing values, wrong number of rows, proportions going below 0 or over 1, wrong class or implicit coercions, … - Read R Inferno by Patrick Burns with summary slides by Maya Gans.