RStudio

Introduction

RStudio is an integrated development environment for the R programming language, with limited support for other programming languages (including Python, bash, and SQL). RStudio provides a powerful graphical environment for importing data in a number of formats (including CSV, Excel spreadsheets, SAS, and SPSS); manipulating, analyzing, and visualizing data; version control with git or SVN; a graphical R package manager that provides point/click search/installation/uninstallation of R packages from its substantial ecosystem (including the Bioconductor repository, which provides almost 1500 software tools “for the analysis and comprehension of high-throughput genomic data.”); and many other features.

RStudio Server is a client/server version of RStudio that runs on a remote server and is accessed via the client’s web browser. A graphical file manager allows file upload/download from hpc-class via web browser.

IMPORTANT: This guide was made using the hpc-class cluster. If you're using a different cluster, replace hpc-class with your correct cluster. i.e. if you're on the condo cluster do: hpc-class.its.iastate.edu ---> condo2017.its.iastate.edu. RStudio container is not currently available on the Nova cluster.

ISU Options for RStudio

ISU users can use RStudio in one of the following ways:

  1. To run RStudio and access data on your local workstation, download the open source RStudio Desktop.
  2. To run RStudio Server on and access data in hpc-class, follow the directions in this guide.

RStudio Server on hpc-class

RStudio Server is currently available on hpc-class using a Docker image (imported into Singularity) provided by the Rocker project. The provided geospatial image provides not only geospatial libraries, but also LaTeX / publishing libraries, and Tidyverse data science libraries. Other R packages can be easily installed into your home directory from within RStudio.

Running RStudio Server on hpc-class allows ISU users to access any data on hpc-class that they can access from the command line (SSH). To use RStudio Server on hpc-class, a user submits a SLURM job script. This allows RStudio Server to run on any available hpc-class compute resources (including large-memory nodes). A default job script that should suffice for most users is provided.

After a user is done using RStudio Server, they should save their work in RStudio, and then stop RStudio Server by cancelling the job with the slurm scancel command.

A few notes:

  1. RStudio terminal (bash command shell): since RStudio Server is running in a container with a Debian base image, you won’t be able to access software environment modules (e.g., that you would normally see when logging into hpc-class and issuing the module list command), as those are installed on the (CentOS) host.
  2. Data access: your home directory is mounted inside the RStudio Server container, and HPC Group has configured Singularity to mount the /project directory. $TMPDIR (which on a compute node is per-job local scratch on the compute node’s direct attached storage that gets deleted at the end of SLURM job) is mounted inside the container at /tmp. If you have any questions you can email hpc-help@iastate.edu.
  3. Software installation: The provided SLURM job script creates a ~/.Renviron file in your home directory that allows RStudio to install additional R packages into your home directory (the container image is immutable). Installing a lot of R libraries may contribute to the default 10G soft limit quota on your home directory being surpassed.

Starting RStudio Server

  1. Log into hpc-class via SSH (see the Quick Start Guide for instructions).
  2. Submit the RStudio SLURM job script with the following command:
    > sbatch /shared/hpc/containers/Rstudio/4.0.0/rstudio.job
    (Optional) By default, this SLURM job is limited to a 4 hour time limit, 1 processor core, and 6600 MB memory. To customize, see the section Requesting Additional Compute Resources below.
  3. After the job has started, view the “$HOME/rstudio-JOBID.out” file for login information (where JOBID is the SLURM job ID reported by the sbatch command).

    > module load singularity > sbatch /shared/hpc/containers/Rstudio/4.0.0/rstudio.job
    Submitted batch job 214664
    > cat ~/rstudio-214664.out
    ...
    [Picture]
  4. Point your web browser to the listed hostname / port, then enter your ISU user name and the temporary password (valid only for this job only; in this example 9BtKF4VRuOO+BsvpDjav)
    [Picture]

Stopping RStudio Server

  1. Click the Quit Session (“power”) button in the top-right corner of the RStudio window (see picture below), or select "File > Quit Session..."
    [Picture][Picture]
  2. After the “R Session has Ended” window appears, cancel the SLURM job from the hpc-class command line. E.g., if the job ID is 214664:
    > scancel -f 214664
    Be sure to specify the scancel -f / --full option as demonstrated above.
  3. (If using SSH Port Forwarding instead of VPN) Close the terminal / PuTTY window in which the SSH tunnel was established.

macOS / Linux / Windows Users

  1. Open a new macOS/Linux terminal window or a new Windows PowerShell/Command Prompt window and enter the SSH command listed in the job script output file. In this example:
    > ssh -N -L 8787:hpc-class14:51530 jane.user@hpc-class.its.iastate.edu
    There will be no output after logging in. Keep the window / SSH tunnel open for the duration of the RStudio session.
  2. Point your browser to http://localhost:8787. Enter your ISU user name, and one-time password listed in the job script output file.

PuTTY Users

NOTE: It's recommended to use a terminal to connect but you are still able to use PuTTY if you prefer

The following silent video is a media alternative for the text in steps 1-4 below: rstudio-from-putty-port-forward.mp4

  1. Open a new PuTTY window
  2. In Session > Host Name, enter: hpc-class.its.iastate.edu [Picture]
  3. In the category: Connection > SSH > Tunnels, enter 8787 in Source Port, the Destination hostname:port listed in the job script output, click “Add”, then click “Open”.
    [Picture][Picture][Picture]
  4. Point your browser to http://localhost:8787 (same as the images above). Enter your ISU user name, and one-time password listed in the job script output file.

    [Picture]

Chrome browser users

Video showing how to ssh to hpc-class using the Chrome Secure Shell App: chrome-ssh.mp4

Requesting Additional Compute Resources

The default job resources (4 hour time limit, 1 processor core, 6600 MB memory) may be customized by:

  • sbatch command-line options, e.g., to specify an 8-hour wall time limit, 16 G memory, and 2 processor cores (= 4 hardware threads):
    sbatch --time=08:00:00 --mem=16G --cpus-per-task=4 /shared/hpc/containers/Rstudio/4.0.0/rstudio.job
  • Copying the job script to a directory one has write access to and modifying the appropriate SLURM #SBATCH directives.