RStudio

Introduction

RStudio is an integrated development environment for the R programming language, with limited support for other programming languages (including Python, bash, and SQL). RStudio provides a powerful graphical environment for importing data in a number of formats (including CSV, Excel spreadsheets, SAS, and SPSS); manipulating, analyzing, and visualizing data; version control with git or SVN; a graphical R package manager that provides point/click search/installation/uninstallation of R packages from its substantial ecosystem (including the Bioconductor repository, which provides almost 1500 software tools “for the analysis and comprehension of high-throughput genomic data.”); and many other features.

RStudio Server is a client/server version of RStudio that runs on a remote server and is accessed via the client’s web browser. A graphical file manager allows file upload/download from hpc-class via web browser.

IMPORTANT: This guide was made using the hpc-class cluster. If you're using a different cluster, replace hpc-class with your correct cluster. i.e. if you're on the condo cluster do: hpc-class.its.iastate.edu ---> condo2017.its.iastate.edu. RStudio container is not currently available on the Nova cluster.

ISU Options for RStudio

ISU users can use RStudio in one of the following ways:

  1. Preferred: Access RStudio via Open OnDemand
  2. To run RStudio and access data on your local workstation, download the open source RStudio Desktop.
  3. To run RStudio Server on and access data in hpc-class, follow the directions in this guide.

RStudio Server on hpc-class

RStudio Server is currently available on hpc-class using a Docker image (imported into Singularity) provided by the Rocker project. The provided geospatial image provides not only geospatial libraries, but also LaTeX / publishing libraries, and Tidyverse data science libraries. Other R packages can be easily installed into your home directory from within RStudio.

Running RStudio Server on hpc-class allows ISU users to access any data on hpc-class that they can access from the command line (SSH). To use RStudio Server on hpc-class, a user submits a SLURM job script. This allows RStudio Server to run on any available hpc-class compute resources (including large-memory nodes). A default job script that should suffice for most users is provided.

After a user is done using RStudio Server, they should save their work in RStudio, and then stop RStudio Server by cancelling the job with the slurm scancel command.

A few notes:

  1. RStudio terminal (bash command shell): since RStudio Server is running in a container with a Debian base image, you won’t be able to access software environment modules (e.g., that you would normally see when logging into hpc-class and issuing the module list command), as those are installed on the (CentOS) host.
  2. Data access: your home directory is mounted inside the RStudio Server container, and HPC Group has configured Singularity to mount the /project directory. $TMPDIR (which on a compute node is per-job local scratch on the compute node’s direct attached storage that gets deleted at the end of SLURM job) is mounted inside the container at /tmp. If you have any questions you can email hpc-help@iastate.edu.
  3. Software installation: The provided SLURM job script creates a ~/.Renviron file in your home directory that allows RStudio to install additional R packages into your home directory (the container image is immutable). Installing a lot of R libraries may contribute to the default 10G soft limit quota on your home directory being surpassed.

Starting RStudio Server

  1. Log into hpc-class via SSH (see the Quick Start Guide for instructions).
  2. Submit the RStudio SLURM job script with the following command:
    > sbatch /shared/hpc/containers/Rstudio/4.0.0/rstudio.job
    (Optional) By default, this SLURM job is limited to a 4 hour time limit, 1 processor core, and 6600 MB memory. To customize, see the section Requesting Additional Compute Resources below.
  3. After the job has started, view the “$HOME/rstudio-JOBID.out” file for login information (where JOBID is the SLURM job ID reported by the sbatch command).

    > module load singularity > sbatch /shared/hpc/containers/Rstudio/4.0.0/rstudio.job
    Submitted batch job 214664
    > cat ~/rstudio-214664.out
    ...
    [Picture]
  4. Point your web browser to the listed hostname / port, then enter your ISU user name and the temporary password (valid only for this job only; in this example 9BtKF4VRuOO+BsvpDjav)
    [Picture]

Stopping RStudio Server

  1. Click the Quit Session (“power”) button in the top-right corner of the RStudio window (see picture below), or select "File > Quit Session..."
    [Picture] [Picture]
  2. After the “R Session has Ended” window appears, cancel the SLURM job from the hpc-class command line. E.g., if the job ID is 214664:
    > scancel -f 214664
    Be sure to specify the scancel -f / --full option as demonstrated above.
  3. (If using SSH Port Forwarding instead of VPN) Close the terminal / PuTTY window in which the SSH tunnel was established.

macOS / Linux / Windows Users

  1. Open a new macOS/Linux terminal window or a new Windows PowerShell/Command Prompt window and enter the SSH command listed in the job script output file. In this example:
    > ssh -N -L 8787:hpc-class14:51530 jane.user@hpc-class.its.iastate.edu
    There will be no output after logging in. Keep the window / SSH tunnel open for the duration of the RStudio session.
  2. Point your browser to http://localhost:8787. Enter your ISU user name, and one-time password listed in the job script output file.

PuTTY Users

NOTE: It's recommended to use a terminal to connect but you are still able to use PuTTY if you prefer

The following silent video is a media alternative for the text in steps 1-4 below: rstudio-from-putty-port-forward.mp4

  1. Open a new PuTTY window
  2. In Session > Host Name, enter: hpc-class.its.iastate.edu [Picture]
  3. In the category: Connection > SSH > Tunnels, enter 8787 in Source Port, the Destination hostname:port listed in the job script output, click “Add”, then click “Open”.
    [Picture] [Picture] [Picture]
  4. Point your browser to http://localhost:8787 (same as the images above). Enter your ISU user name, and one-time password listed in the job script output file.

    [Picture]

Chrome browser users

Video showing how to ssh to hpc-class using the Chrome Secure Shell App: chrome-ssh.mp4

Requesting Additional Compute Resources

The default job resources (4 hour time limit, 1 processor core, 6600 MB memory) may be customized by:

  • sbatch command-line options, e.g., to specify an 8-hour wall time limit, 16 G memory, and 2 processor cores (= 4 hardware threads):
    sbatch --time=08:00:00 --mem=16G --cpus-per-task=4 /shared/hpc/containers/Rstudio/4.0.0/rstudio.job
  • Copying the job script to a directory one has write access to and modifying the appropriate SLURM #SBATCH directives.