When an analyst installs a package that is not in base r, where does r call the package from?

R Packages

When an analyst installs a package that is not in base r, where does r call the package from?

In this course you will get an introduction to the main tools and ideas in the data scientist's toolbox. The course gives an overview of the data, questions, and tools that data analysts and data scientists work with. There are two components to this course. The first is a conceptual introduction to the ideas behind turning data into actionable knowledge. The second is a practical introduction to the tools that will be used in the program like version control, markdown, git, GitHub, R, and RStudio.

View Syllabus

Data Science, Github, R Programming, Rstudio

From the lesson

R and RStudio

In this module, we'll help you get up and running with both R and RStudio. Along the way, you'll learn some basics about both and why data scientists use them.

Select a languageArabicEnglishFrenchGermanItalianKoreanRussianSpanishVietnameseptPt

R is a statistical software made up of many user-written packages. The base version of R that is downloaded allows the user to get started in R, but anyone performing data analysis will quickly exhaust the capabilities of base R and need to install additional packages. Here are some basic commands for managing R packages.

Which packages do I already have?

To see what packages are installed, use the installed.packages() command. This will return a matrix with a row for each package that has been installed. Below, we look at the first 5 rows of this matrix.

installed.packages()[1:5,] Package LibPath Version Priority base "base" "C:/PROGRA~1/r/R-211~1.1/library" "2.11.1" "base" boot "boot" "C:/PROGRA~1/r/R-211~1.1/library" "1.2-42" "recommended" car "car" "C:/PROGRA~1/r/R-211~1.1/library" "2.0-2" NA class "class" "C:/PROGRA~1/r/R-211~1.1/library" "7.3-2" "recommended" cluster "cluster" "C:/PROGRA~1/r/R-211~1.1/library" "1.12.3" "recommended" Depends Imports LinkingTo base NA NA NA boot "R (>= 2.9.0), graphics, stats" NA NA car "R (>= 2.1.1), stats, graphics, MASS, nnet, survival" NA NA class "R (>= 2.5.0), stats, utils" "MASS" NA cluster "R (>= 2.9.0), stats, graphics, utils" NA NA Suggests Enhances OS_type License Built base NA NA NA "Part of R 2.11.1" "2.11.1" boot "survival" NA NA "Unlimited" "2.11.1" car "alr3, leaps, lmtest, sandwich, mgcv, rgl" NA NA "GPL (>= 2)" "2.11.1" class NA NA NA "GPL-2 | GPL-3" "2.11.1" cluster NA NA NA "GPL (>= 2)" "2.11.1"

From this output, we will first focus on the Package and Priority columns. The Package column gives the name of the package and the Priority column indicates what is needed to use functions from the package.

  • If Priority is "base", then the package is already installed and loaded, so all of its functions are available upon opening R.
  • If Priority is "recommended", then the package was installed with base R, but not loaded. Before using the commands from this package, the user will have to load it with the library command, e.g. library(boot).
  • If Priority is NA, then the package was installed by the user, but not loaded. Before using the commands from this package, the user will have to load it with the library command, i.e., library(car).

Have I installed a specific package?

Sometimes, you might want to know if you have already installed a specific package. Let’s say we want to check if we have installed the package "boot". Instead of checking the entire list of installed packages, we can do the following.

a<-installed.packages() packages<-a[,1] is.element("boot", packages) [1] TRUE

How can I add or delete packages?

Any package that does not appear in the installed packages matrix must be installed and loaded before its functions can be used. A package can be installed using install.packages("<package name>"). A package can be removed using remove.packages("<package name>").

What packages are available?

The list of available R packages is constantly growing. The actual list can be obtained using available.packages(). This returns a matrix with a row for each package.

p <-> [1] 2553 12 p[1:5,] Package Version Priority Depends Imports ACCLMA "ACCLMA" "1.0" NA NA NA ADGofTest "ADGofTest" "0.1" NA NA NA AER "AER" "1.1-7" NA "R (>= 2.5.0), stats, car (>= 2.0-1), Formula (>= 0.2-0),nlmtest, sandwich, strucchange, survival, zoo" "stats" AGSDest "AGSDest" "1.0" NA "ldbounds" NA AICcmodavg "AICcmodavg" "1.11" NA NA NA LinkingTo ACCLMA NA ADGofTest NA AER NA AGSDest NA AICcmodavg NA Suggests ACCLMA NA ADGofTest NA AER "boot, dynlm, effects, foreign, ineq, KernSmooth, lattice,nMASS, mlogit, nlme, nnet, np, plm, pscl, quantreg, ROCR,nsampleSelection, scatterplot3d, systemfit, rgl, truncreg,ntseries, urca" AGSDest NA AICcmodavg "lme4, MASS, nlme, nnet" Enhances OS_type License File Repository ACCLMA NA NA "GPL-2" NA "http://cran.stat.ucla.edu/bin/windows/contrib/2.11" ADGofTest NA NA "GPL" NA "http://cran.stat.ucla.edu/bin/windows/contrib/2.11" AER NA NA "GPL-2" NA "http://cran.stat.ucla.edu/bin/windows/contrib/2.11" AGSDest NA NA "GPL (>= 2)" NA "http://cran.stat.ucla.edu/bin/windows/contrib/2.11" AICcmodavg NA NA "GPL (>= 2 )" NA "http://cran.stat.ucla.edu/bin/windows/contrib/2.11"

These first five (of 2,553) available packages illustrate that the package names are often acronyms and rarely reveal what the package functions do. A list of the packages available through CRAN including a short package description can be found at CRAN’s Contributed Packages page.

What functions and datasets are in a package?

It is easy to access some quick documentation for a package from R with the help command. This opens an R window with package information followed by a list of functions and datasets.

help(package="MASS")

When an analyst installs a package that is not in base r, where does r call the package from?

Once a package is loaded, the help command can also be used with all functions and datasets listed here, e.g. help(Null).

Approximate time: 25 min

Learning Objectives

  • Explain different ways to install external R packages
  • Demonstrate how to load a library and how to find functions specific to a package

Packages and Libraries

Packages are collections of R functions, data, and compiled code in a well-defined format, created to add specific functionality. There are 10,000+ user contributed packages and growing.

There are a set of standard (or base) packages which are considered part of the R source code and automatically available as part of your R installation. Base packages contain the basic functions that allow R to work, and enable standard statistical and graphical functions on datasets; for example, all of the functions that we have been using so far in our examples.

The directories in R where the packages are stored are called the libraries. The terms package and library are sometimes used synonymously and there has been discussion amongst the community to resolve this. It is somewhat counter-intuitive to load a package using the library() function and so you can see how confusion can arise.

You can check what libraries are loaded in your current R session by typing into the console:

sessionInfo() #Print version information about R, the OS and attached or loaded packages # OR search() #Gives a list of attached packages

Previously we have introduced you to functions from the standard base packages. However, the more you work with R, you will come to realize that there is a cornucopia of R packages that offer a wide variety of functionality. To use additional packages will require installation. Many packages can be installed from the CRAN or Bioconductor repositories.

  • Package names are case sensitive!
  • At any point (especially if you’ve used R/Bioconductor in the past), in the console R may ask you if you want to “update any old packages by asking Update all/some/none? [a/s/n]:”. If you see this, type “a” at the prompt and hit Enter to update any old packages. Updating packages can sometimes take awhile to run. If you are short on time, you can choose “n” and proceed. Without updating, you run the risk of conflicts between your old packages and the ones from your updated R version later down the road.
  • If you see a message in your console along the lines of “binary version available but the source version is later”, followed by a question, “Do you want to install from sources the package which needs compilation? y/n”, type n for no, and hit enter.

Package installation from CRAN

CRAN is a repository where the latest downloads of R (and legacy versions) are found in addition to source code for thousands of different user contributed R packages.

When an analyst installs a package that is not in base r, where does r call the package from?

Packages for R can be installed from the CRAN package repository using the install.packages function. This function will download the source code from on the CRAN mirrors and install the package (and any dependencies) locally on your computer.

An example is given below for the ggplot2 package that will be required for some plots we will create later on. Run this code to install ggplot2.

install.packages("ggplot2")

Alternatively, packages can also be installed from Bioconductor, another repository of packages which provides tools for the analysis and comprehension of high-throughput genomic data. These packages includes (but is not limited to) tools for performing statistical analysis, annotation packages, and accessing public datasets.

When an analyst installs a package that is not in base r, where does r call the package from?

There are many packages that are available in CRAN and Bioconductor, but there are also packages that are specific to one repository. Generally, you can find out this information with a Google search or by trial and error.

To install from Bioconductor, you will first need to install BiocManager. This only needs to be done once ever for your R installation.

# DO NOT RUN THIS! install.packages("BiocManager")

Now you can use the install() function from the BiocManager package to install a package by providing the name in quotations.

Here we have the code to install ggplot2, through Bioconductor:

# DO NOT RUN THIS! BiocManager::install("ggplot2")

The code above may not be familiar to you - it is essentially using a new operator, a double colon :: to execute a function from a particular package. This is the syntax: package::function_name().

Package installation from source

Finally, R packages can also be installed from source. This is useful when you do not have an internet connection (and have the source files locally), since the other two methods are retrieving the source files from remote sites.

To install from source, we use the same install.packages function but we have additional arguments that provide specifications to change from defaults:

# DO NOT RUN THIS! install.packages("~/Downloads/ggplot2_1.0.1.tar.gz", type="source", repos=NULL)

Loading libraries

Once you have the package installed, you can load the library into your R session for use. Any of the functions that are specific to that package will be available for you to use by simply calling the function as you would for any of the base functions. Note that quotations are not required here.

You can also check what is loaded in your current environment by using sessionInfo() or search() and you should see your package listed as:

other attached packages: [1] ggplot2_2.0.0

In this case there are several other packages that were also loaded along with ggplot2.

We only need to install a package once on our computer. However, to use the package, we need to load the library every time we start a new R/RStudio environment. You can think of this as installing a bulb versus turning on the light.

When an analyst installs a package that is not in base r, where does r call the package from?

Analogy and image credit to Dianne Cook of Monash University.

Finding functions specific to a package

This is your first time using ggplot2, how do you know where to start and what functions are available to you? One way to do this, is by using the Package tab in RStudio. If you click on the tab, you will see listed all packages that you have installed. For those libraries that you have loaded, you will see a blue checkmark in the box next to it. Scroll down to ggplot2 in your list:

When an analyst installs a package that is not in base r, where does r call the package from?

If your library is successfully loaded you will see the box checked, as in the screenshot above. Now, if you click on ggplot2 RStudio will open up the help pages and you can scroll through.

An alternative is to find the help manual online, which can be less technical and sometimes easier to follow. For example, this website is much more comprehensive for ggplot2 and is the result of a Google search. Many of the Bioconductor packages also have very helpful vignettes that include comprehensive tutorials with mock data that you can work with.

If you can’t find what you are looking for, you can use the rdocumention.org website that search through the help files across all packages available.

Exercise

The ggplot2 package is part of the tidyverse suite of integrated packages which was designed to work together to make common data science operations more user-friendly. We will be using the tidyverse suite in later lessons, and so let’s install it. NOTE: This suite of packages is only available in CRAN.

This lesson has been developed by members of the teaching team at the Harvard Chan Bioinformatics Core (HBC). These are open access materials distributed under the terms of the Creative Commons Attribution license (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

  • The materials used in this lesson are adapted from work that is Copyright © Data Carpentry (http://datacarpentry.org/). All Data Carpentry instructional material is made available under the Creative Commons Attribution license (CC BY 4.0).