Data Workflow Using R: Sogang University Workshop, 2023 Fall

(In Korean) A series of slides on a principled crash course on learning R.

  1. 데이터 워크플로우의 정의 및 R 기초 문법
  2. Tidyverse로 데이터 불러오고 변환하기
  3. R 함수 및 함수형 프로그래밍

I. Beginner-friendly Learning Materials


II. What If I Have Coding Questions? What If Something is Not Working?

Before you start seeking help:

  1. Take a deep breath and accept that learning to debug is (highly likely) a painful, grueling process that has a steep learning curve. You will likely have to go through many frustrating minutes, hours, or even days! It gets better, but with some heavy investment.
  2. (Slightly outdated advice now) Create a Stack Overflow account.

Now that you’ve braced yourself,

  • Google the error message. 90%+ of the time, the question has already been asked on Stack Overflow. Save yourself some tokens if you’re reliant on LLMs.
  • Use an LLM assistant (Claude, ChatGPT, etc.) to help debug errors, explain code, or generate examples. Most handle R quite well.
  • Modern IDEs now have AI assistants built in: GitHub Copilot integrates with RStudio and VSCode, for example.

Note that AI-generated answers may not always be accurate, especially for statistical methods. Verify outputs for correctness. In addition, relying fully on AI especially for auto-completing tasks may only make you artificially intelligent at best. (Ouch!)


III. Advanced Steps

  • This is a good resource for those now familiar with R4Ds: Wickham, Hadley. Advanced R. 2nd ed. Chapman and Hall/CRC, 2019.
  • Comment generously! You will not be able to remember what you were doing without ample commenting, nor will other people be able to understand your code.
  • Never save/restore .RData. In fact, uncheck “Restore most recently opened project at startup,” “Restore previously open source documents at startup,” “Restore .RData into workspace at startup”, and “Always save history.” Finally, set “save workspace to .RData on exit” to “Never.”
  • Form each data analysis into a project, use here::here() as opposed to setwd, and be mindful of the project-oriented workflow and reproducibility.
  • Please use the styler package, which makes it very easy to style your code consistently.
  • Using assertthat, insert unit tests/sanity checks for your dataset so that you catch mistakes quickly. For example, you could preemptively detect and warn about duplicates, missing values, wrong number of rows, proportions going below 0 or over 1, wrong class or implicit coercions, …
  • Read R Inferno by Patrick Burns with summary slides by Maya Gans.