6  Version Control

Reading: 6 minute(s) at 200 WPM

Videos: 0 minute(s)

Objectives

Most of this section is either heavily inspired by Happy Git and Github for the UseR (Bryan, Hester, and The Stat 545 TAs 2021) or directly links to that book.

  • Recognize the benefits of using version control to improve your coding practices and workflow.
  • Identify git/GitHub as a version control platform (and helper).
  • Register for a GitHub account so you can begin applying version control practices to your workflow.


6.1 What is Version Control?

Version control is a system that (1) allows you to store your files in the cloud, (2) track change in those files over time, and (3) share your files with others.

Learn more about version control

If you are unfamiliar with the idea of version control, this article describes what the principles of version control are.

6.2 Git

Git is a version control system - a structured way for tracking changes to files over the course of a project that may also make it easy to have multiple people working on the same files at the same time.

Version control is the answer to this file naming problem.

Git manages a collection of files in a structured way - rather like “track changes” in Microsoft Word or version history in Dropbox, but much more powerful.

If you are working alone, you will benefit from adopting version control because it will remove the need to add _final.R or _final_finalforreal.qmd to the end of your file names. However, most of us work in collaboration with other people (or will have to work with others eventually), so one of the goals of this program is to teach you how to use git because it is a useful tool that will make you a better collaborator.

In data science programming, we use git for a similar, but slightly different purpose. We use it to keep track of changes not only to code files, but to data files, figures, reports, and other essential bits of information.

Git itself is nice enough, but where git really becomes amazing is when you combine it with GitHub - an online service that makes it easy to use git across many computers, share information with collaborators, publish to the web, and more. Git is great, but GitHub is … essential.

6.2.1 Git Basics

Person 1: 'This is GIT. It tracks collaborative work on projects through a beautiful distributed graph theory tree model'. Person 2: 'Cool, How do we use it?' Person 1: 'No Idea. Just memorize these shell commands and type them to sync up. If you get errors, save your work elsewhere, delete the project, and download a fresh copy.'

If that doesn’t fix it, git.txt contains the phone number of a friend of mine who understands git. Just wait through a few minutes of ‘It’s really pretty simple, just think of branches as…’ and eventually you’ll learn the commands that will fix everything.

Git tracks changes to each file that it is told to monitor, and as the files change, you provide short labels describing what the changes were and why they exist (called “commits”). The log of these changes (along with the file history) is called your git commit history.

When writing papers, this means you can cut material out freely, so long as the paper is being tracked by git - you can always go back and get that paragraph you cut out if you need to. You also don’t have to rename files - you can confidently save over your old files, so long as you remember to commit frequently.

Essential Reading: Git

The git material in this chapter is just going to link directly to the book “Happy Git with R” by Jenny Bryan. It’s amazing, amusing, and generally well written. I’m not going to try to do better.

Go read Chapter 1, until it starts to become greek (aka over your head).



Now that you have a general idea of how git works and why we might use it, let’s talk a bit about GitHub.

6.3 GitHub: Git on the Web

Git is a program that runs on your machine and keeps track of changes to files that you tell it to monitor. GitHub is a website that hosts people’s git repositories. You can use git without GitHub, but you can’t use GitHub without git.

If you want, you can hook Git up to GitHub, and make a copy of your local git repository that lives in the cloud. Then, if you configure things correctly, your local repository will talk to GitHub without too much trouble. Using Github with Git allows you to easily make a cloud backup of your important code, so that even if your computer suddenly catches on fire, all of your important code files exist somewhere else.

Remember: any data you don’t have in 3 different places is data you don’t care about.1


Save your login information!

Make sure you remember your username and password so you don’t have to try to hack into your own account during class this week.

Write your information down somewhere safe.

Optional: Install a git client

Instructions

I personally like to use GitHub Desktop which allows me to interact with Git using a point-and-click interface.

6.4 Using Version Control (with RStudio)

This course will briefly introduced working with GitHub, but will not provide you with extensive practice using version control. By using version control, you will learn better habits for programming, and you’ll get access to a platform for collaboration, hosting your work online, keeping track of features and necessary changes, and more.

In class this week, we will connect git/GitHub to RStudio so you can use version control for your code. We will then see what a typical git/GitHub workflow looks like.

Learn More

Extra Resources


References

Bryan, Jenny, Jim Hester, and The Stat 545 TAs. 2021. Happy Git and GitHub for the useR. https://happygitwithr.com/.
Erica Heidi. 2020. “Stage. Commit. Push. A Git Story (Comic).” DEV Community. https://dev.to/erikaheidi/stage-commit-push-a-git-story-comic-a37.
The Coding Train. 2016. “Introduction: Git and GitHub for Poets.” https://www.youtube.com/watch?v=BCQHnlnPusY.
Traversy Media. 2017. Git & GitHub Crash Course For Beginners. https://www.youtube.com/watch?v=SWYqp7iY_Tc.
Wei, Jerry. 2019. “A Quick Guide to Using Command Line (Terminal).” Towards Data Science. https://towardsdatascience.com/a-quick-guide-to-using-command-line-terminal-96815b97b955.

  1. Yes, I’m aware that this sounds paranoid. It’s been a very rare occasion that I’ve needed to restore something from another backup. You don’t want to take chances.↩︎