git to know git: an 8 minute introduction

By Amy Peterson

Using Git

Git is a version control system that allows you to track changes made to files while working on a project, either independently or in collaboration with others. It provides a way to save many different components of a project in progress, including the source code, but also the figures and data that the code produces. The importance of understanding and using Git lies in its ability to maintain an organized record of a project, also referred to as a repository or repo, as it evolves. While setting up and learning to use Git may seem intimidating, the majority of the work is in the initial setup.

GitHub

GitHub is one of the hosting services that provides an interface for using Git, and can be thought of as Dropbox for version control projects. GitHub is one of the ways to store repositories using Git, and is an easy way to routinely back-up your work as you make progress on a project. It is also helpful for tracking changes, demonstrating who contributed to which projects, when they contributed, and what their contributions were.

Why I Use Git

When I started as an Intern at the LIBD, I noticed how frequently GitHub was used. As I familiarized myself with some of the projects I would be working on, it became clear how much easier it was to use a system that could document project changes made throughout time in a way that was widely accessible to contributors. Using GitHub also made it easier to re-visit certain scripts or documents to determine what changes were made, when, and why they were needed. Having a detailed history of various project components is an easy way to ensure that contributors have information organized in the same way.

Beyond working on projects with collaborators, using GitHub is equally rewarding when used for individual projects. Particularly if working on some projects at work on one computer, and needing those updates to be accessible on a different computer at home, GitHub is a quick and easy way to keep a project updated across computers to ensure you are always working on the latest updates.

Terms

commit: saves changes, either adding a new file to GitHub, or updating the existing version of that file

issue: option on GitHub that creates a list of action items for a repository, similar to a to-do list; tasks can be assigned to particular contributors; also possible to commented on and reference particular tasks within a commit message by including # and the issue number

push: sends the commits made locally to the repository on GitHub

pull: downloads modified or newly added files, so the local directory matches the current repository on GitHub

Public v. Private Repositories

Repositories can be public or private. Public repositories are readable to everyone, but permissions are still required to make edits by pushing commits. Private repositories are inaccessible and unreadable without permission, with the repository owner having control to moderate who has access to read, edit, or extend admin access.

Features

Watch: Provides a way to receive notifications regarding all updates on a particular repository of interest.

Star: Marks a specific repository of interest, making it easier to refer back to it later. Differs from watch in that you do not receive notifications for repository updates.

Fork: Downloads a copy of the current version of the file from GitHub. The downloaded copy exists separately from the repo, and reflects the file as is at the time of the download.

Initial Set-Up

Make an account on GitHub.

Mac

On newer Macs this should already be set up, but checking is easy!

## Open Terminal Application
which git # determine if Git is installed
git --version # lists current Git version installed

## If not installed, use the following to install
git --version
git config 

Windows

Install Git for Windows

After Git Installation

## Open Terminal (Mac) or Git Bash (Windows)
## Enter the name and email associated with your GitHub account
git config --global user.name "Amy Peterson"
git config --global.email "amy.peterson@jhu.edu"
git config --global --list # Lists global configuration options 

Setting Up a Repository

Identify a repository you want to contribute to, or create your own! Repositories can be created on the front page using “Start a project” or by clicking the green “New” button by clicking repositories from your profile page.

Next, take the following steps

## Open Terminal (Mac) or Git Bash (Windows)
# Change directories so you are in the directory where you want to set up the repository
get pwd() #gives name of current directory 
cd /~Desktop #changes current directory to Desktop 
ls #lists folders you can cd into 
# On the repository page on GitHub, click the green "Clone or Download"
git clone git@GitHub.com:SampleLink.git # Paste link from GitHub to download the repository locally

Saving Your Work

The process of updating GitHub is as follows:

## Open Terminal (Mac) or Git Bash (Windows)
git add File1.R # adds file, here File1.R, to GitHub
git commit -m "Example message" # attaches the message in parentheses to the files being added to GitHub
git push # save file to GitHub

# Once updates are pushed, other repository members need to do the following
git pull # updates local directory to reflect the changes made to GitHub

Useful at any time throughout the process of updating a repository, git status provides information regarding how your local directory differs from the repository on GitHub, and separates those differences into which files have had changes made, and which files are entirely new. In the example below, File1.R and File2.pdf have been modified from what exists on GitHub, while File3.R and File4.pdf are untracked, or entirely new to the repository.

Committing Folders

Folders associated with a project can also be committed to a repository on GitHub. Folders that are currently untracked will be listed in response to git status, and committing a folder to a repository will simultaneously commit all of its contents. This is particularly useful and efficient when creating a repository for an existing project.

Making Multiple Commits

Multiple commits can be made to group files before pushing them to GitHub. Each set of files you have added using git add will be grouped together as a single commit once you type git commit and enter the commit message you want associated with the files. Then, once all the commits you are ready to make are finished, use git push to save the commits to GitHub.

Starting a Repository for an Existing Project

There are only a few differences for setting up a repository for an existing project, compared to the steps previously described.

Most importantly, after setting up a new repository on GitHub, the next screen will list a number of options. If you are setting up a repository for an existing project, and hoping to commit locally saved files, you will first need to cd into the locally existing project folder. Then use the instructions below that appear under on the GitHub website under the header “create a new repository on the command line”. In the screenshot from the example below, the repository I named is called “test”.

Git Ignore

Git ignore files are important for both new and old project repositories. They are scripts that specify which file types should be ignored, meaning they will not be included in the list git status provides to inform you of local files that are not currently saved to GitHub. Git ignore is important when creating a repository for an existing project, since there will be some existing local files that you will not want to include in the repository, for example, larger files that are not necessary to upload and include on the repository long-term. With new project repositories, you do not need to start with an extensive git ignore file, but can edit it as the project evolves, since it will become more clear over time which file types you do not want to include in the repository.

## Open Terminal (Mac) or Git Bash (Windows)
touch .gitignore # Creates git ignore file
# Open the file to edit, then commit the file to your GitHub repository

An example of a git ignore file is below. As demonstrated, an asterisk can be used to designate entire file types to ignore. For example, adding *.zip would ignore any zip files that are saved locally when using git status to determine the differences between local files and the repository on GitHub.

Summary

The general steps for saving files from your local directory to GitHub is

git add -> git commit -> git push

Git pull will be used to download files from GitHub to match what exists on your local directory.

This project was written as a brief introduction to Git and GitHub, for individuals who are interested in incorporating Git into their work. This post is by no means a comprehensive introduction. For more detailed information regarding GitHub, and using Git, Happy Git and GitHub for the useR is a great resource.

Hopefully this post was helpful in serving as a brief introduction and a way to become more familiarized with some of the basic concepts behind Git and GitHub. Feel free to leave questions or share your story in the comments!

Continuous rstats learning

We are researchers at the @LieberInstitute, blogging about R packages, how-to guides & occasionally our own open-source software (opinions r our own) #rstats

comments powered by Disqus