Data Workflow Using R: Sogang University Workshop, 2023 Fall
(In Korean) A series of slides on a principled crash course on learning R.
I. Beginner-friendly Learning Materials
- How to install R, RStudio, and R packages: a learnr tutorial.
- Wickham, Hadley, Mine Çetinkaya-Rundel, and Garrett Grolemund. R for Data Science. 2nd ed. O’Reilly Media, Inc., 2023.
- fasteR: Fast Lane to Learning R! by Norm Matloff.
- You can use swirl to learn base R interactively. There are various courses that can be installed by swirl, including my swirl-tidy lesson that helps you learn tidyverse. For a quick installation guide, see here.
II. What If I Have Coding Questions? What If Something is Not Working?
Before you start seeking help:
- Take a deep breath and accept that learning to debug is (highly likely) a painful, grueling process that has a steep learning curve. You will likely have to go through many frustrating minutes, hours, or even days! It gets better, but with some heavy investment.
- (Slightly outdated advice now) Create a Stack Overflow account.
- It will help you leave a trail of what worked for you, in terms of upvotes and bookmarks.
- You will learn how to ask a “good question.”
- You will become familiar with the concept of a minimal, reproducible example.
Now that you’ve braced yourself,
- Google the error message. 90%+ of the time, the question has already been asked on Stack Overflow. Save yourself some tokens if you’re reliant on LLMs.
- Use an LLM assistant (Claude, ChatGPT, etc.) to help debug errors, explain code, or generate examples. Most handle R quite well.
- Modern IDEs now have AI assistants built in: GitHub Copilot integrates with RStudio and VSCode, for example.
Note that AI-generated answers may not always be accurate, especially for statistical methods. Verify outputs for correctness. In addition, relying fully on AI especially for auto-completing tasks may only make you artificially intelligent at best. (Ouch!)
III. Advanced Steps
- This is a good resource for those now familiar with R4Ds: Wickham, Hadley. Advanced R. 2nd ed. Chapman and Hall/CRC, 2019.
- Comment generously! You will not be able to remember what you were doing without ample commenting, nor will other people be able to understand your code.
- Never save/restore .RData. In fact, uncheck “Restore most recently opened project at startup,” “Restore previously open source documents at startup,” “Restore .RData into workspace at startup”, and “Always save history.” Finally, set “save workspace to .RData on exit” to “Never.”
- Form each data analysis into a project, use
here::here()as opposed tosetwd, and be mindful of the project-oriented workflow and reproducibility. - Please use the
stylerpackage, which makes it very easy to style your code consistently. - Using
assertthat, insert unit tests/sanity checks for your dataset so that you catch mistakes quickly. For example, you could preemptively detect and warn about duplicates, missing values, wrong number of rows, proportions going below 0 or over 1, wrong class or implicit coercions, … - Read R Inferno by Patrick Burns with summary slides by Maya Gans.