6 Version Control
Reading: 6 minute(s) at 200 WPM
Videos: 0 minute(s)
Objectives
Most of this section is either heavily inspired by Happy Git and Github for the UseR (Bryan, Hester, and The Stat 545 TAs 2021) or directly links to that book.
- Recognize the benefits of using version control to improve your coding practices and workflow.
- Identify git/GitHub as a version control platform (and helper).
- Register for a GitHub account so you can begin applying version control practices to your workflow.
6.1 What is Version Control?
Version control is a system that (1) allows you to store your files in the cloud, (2) track change in those files over time, and (3) share your files with others.
Learn more about version control
If you are unfamiliar with the idea of version control, this article describes what the principles of version control are.
6.2 Git
Git is a version control system - a structured way for tracking changes to files over the course of a project that may also make it easy to have multiple people working on the same files at the same time.
Git manages a collection of files in a structured way - rather like “track changes” in Microsoft Word or version history in Dropbox, but much more powerful.
If you are working alone, you will benefit from adopting version control because it will remove the need to add _final.R
or _final_finalforreal.qmd
to the end of your file names. However, most of us work in collaboration with other people (or will have to work with others eventually), so one of the goals of this program is to teach you how to use git because it is a useful tool that will make you a better collaborator.
In data science programming, we use git for a similar, but slightly different purpose. We use it to keep track of changes not only to code files, but to data files, figures, reports, and other essential bits of information.
Git itself is nice enough, but where git really becomes amazing is when you combine it with GitHub - an online service that makes it easy to use git across many computers, share information with collaborators, publish to the web, and more. Git is great, but GitHub is … essential.
6.2.1 Git Basics
Git tracks changes to each file that it is told to monitor, and as the files change, you provide short labels describing what the changes were and why they exist (called “commits”). The log of these changes (along with the file history) is called your git commit history.
When writing papers, this means you can cut material out freely, so long as the paper is being tracked by git - you can always go back and get that paragraph you cut out if you need to. You also don’t have to rename files - you can confidently save over your old files, so long as you remember to commit frequently.
Essential Reading: Git
The git material in this chapter is just going to link directly to the book “Happy Git with R” by Jenny Bryan. It’s amazing, amusing, and generally well written. I’m not going to try to do better.
Go read Chapter 1, until it starts to become greek (aka over your head).
Now that you have a general idea of how git works and why we might use it, let’s talk a bit about GitHub.
6.3 GitHub: Git on the Web
Git is a program that runs on your machine and keeps track of changes to files that you tell it to monitor. GitHub is a website that hosts people’s git repositories. You can use git without GitHub, but you can’t use GitHub without git.
If you want, you can hook Git up to GitHub, and make a copy of your local git repository that lives in the cloud. Then, if you configure things correctly, your local repository will talk to GitHub without too much trouble. Using Github with Git allows you to easily make a cloud backup of your important code, so that even if your computer suddenly catches on fire, all of your important code files exist somewhere else.
Remember: any data you don’t have in 3 different places is data you don’t care about.1
Save your login information!
Make sure you remember your username and password so you don’t have to try to hack into your own account during class this week.
Write your information down somewhere safe.
Optional: Install a git client
I personally like to use GitHub Desktop which allows me to interact with Git using a point-and-click interface.
6.4 Using Version Control (with RStudio)
This course will briefly introduced working with GitHub, but will not provide you with extensive practice using version control. By using version control, you will learn better habits for programming, and you’ll get access to a platform for collaboration, hosting your work online, keeping track of features and necessary changes, and more.
In class this week, we will connect git/GitHub to RStudio so you can use version control for your code. We will then see what a typical git/GitHub workflow looks like.
Learn More
Extra Resources
Happy Git and GitHub for the useR - Guide to using git, R, and RStudio together. (Bryan, Hester, and The Stat 545 TAs 2021)
Crash course on git (30 minute YouTube video) (Traversy Media 2017)
Git and GitHub for poets YouTube playlist (this is supposed to be the best introduction to Git out there…) (The Coding Train 2016)
More advanced git concepts, in comic form, by Erika Heidi (Erica Heidi 2020)
References
Yes, I’m aware that this sounds paranoid. It’s been a very rare occasion that I’ve needed to restore something from another backup. You don’t want to take chances.↩︎