Data Workflow Using R: Sogang University Workshop, 2023 Fall
(In Korean) A series of slides on a principled crash course on learning R.
I. Beginner-friendly Learning Materials
- How to install R, an R development environment (e.g., RStudio), and R packages: a learnr tutorial.
- Wickham, Hadley, Mine Çetinkaya-Rundel, and Garrett Grolemund. R for Data Science. 2nd ed. O’Reilly Media, Inc., 2023.
- fasteR: Fast Lane to Learning R! by Norm Matloff.
- You can use swirl to learn base R interactively. There are various courses that can be installed by swirl, including my swirl-tidy lesson that helps you learn tidyverse. For a quick installation guide, see here.
II. What If I Have Coding Questions? What If Something is Not Working?
Before you start seeking help:
- Take a deep breath and accept that learning to debug is (highly likely) a painful, grueling process that has a steep learning curve. You will likely have to go through many frustrating minutes, hours, or even days! It gets better, but with some heavy investment.
- (Outdated advice now, RIP Stack Overflow) Create a Stack Overflow account.
- It will help you leave a trail of what worked for you, in terms of upvotes and bookmarks.
- You will learn how to ask a “good question.”
- You will become familiar with the concept of a minimal, reproducible example.
Now that you’ve braced yourself,
- Google the error message. 90%+ of the time, the question has already been asked on Stack Overflow.
- Ask an AI assistant such as Claude or ChatGPT. Give it the code and the message; most handle R quite well. Modern IDEs now have AI assistants built in: GitHub Copilot integrates with RStudio and VSCode, for example.
Note that AI-generated answers may not always be accurate, especially for statistical methods. Verify outputs for correctness. In addition, relying fully on AI especially for auto-completing tasks may only make you irrelevant and artificially intelligent at best. (Ouch!) Use AI to learn, not to avoid learning.
III. Advanced Steps
- This is a good resource for those now familiar with R4Ds: Wickham, Hadley. Advanced R. 2nd ed. Chapman and Hall/CRC, 2019.
- Comment generously! You will not be able to remember what you were doing without ample commenting, nor will other people be able to understand your code. It’ll also help your AI assistant understand your intent.
- Avoid workflows that rely on restoring old workspace state. Your analysis should run from a fresh session using only the files contained in the project. If you use RStudio, uncheck “Restore most recently opened project at startup,” “Restore previously open source documents at startup,” “Restore .RData into workspace at startup,” and “Always save history.” Finally, set “Save workspace to .RData on exit” to “Never.”
- Form each data analysis into a project, use
here::here()as opposed tosetwd, and be mindful of the project-oriented workflow and reproducibility. - Please use the
stylerpackage, which makes it very easy to style your code consistently. - Insert unit tests/sanity checks throughout your workflow so you catch mistakes quickly.
stopifnot()requires no additional packages, while packages such asassertthatprovide more informative error messages. Preemptively detect duplicates, missing values, unexpected numbers of rows, proportions below 0 or above 1, incorrect classes, implicit coercions, and so on. A simple assertion can catch an (AI-introduced) bug before it affects your results. - Read R Inferno by Patrick Burns with summary slides by Maya Gans.