Using RStudio on Amazon AWS

Using RStudio is usually great fun. Some tedious tasks, however, such as building complicated mixed-effects models, can take the fun out of it if they take several hours to complete because as long as RStudio is busy, it is basically frozen and cannot be used for anything else [1]. Also, you cannot shut down your machine as long as the analysis is still running.

Amazon AWS can be an elegant solution to this problem. The service makes it possible to create powerful computing instances with many CPUs that run your computing jobs on a dedicated server. While this is happening, you can use your computer for other things or even switch it off. A version of RStudio (RStudio Server, to be precise) is accessible via browser, so you can check on your jobs at any time from wherever you have internet access.

Getting Started

To use AWS, you have to first create an account: just go to https://aws.amazon.com and follow the instructions. Once you have an account, you can create new computing instances by going to “Services” (top menu) –> “EC2” (under “Computing”) and click the button “Launch Instance.” Please note that different instances come at different prices—usually, the more CPUs and the more storage you select, the more expensive it will be. [2] Also note that you will be charged as long as your instance is running. If your work is done, make sure to stop the instance; you do not have to terminate it, though—unless you are using huge amounts of storage, it is usually sufficient to just stop the instance. One last remark on pricing: do not repeatedly start and stop instances because you will get billed for one hour every time you (re)start an instance.

Does It Make Sense To Use Many CPUs?

Usually it is enough to create an instance that runs standard R with just one CPU since standard R only uses one CPU no matter how many you provide. If your task can be parallelized, however, it might make sense to use Microsoft R instead, which uses n/2 of the available CPUs. Using the apply function to modify a big data frame is an example of a task that can be parallelized because the individual iterations are independent of each other. Stuff like building mixed-effects models, on the other hand, cannot be parallelized because the iterations depend on each other. Therefore, creating instances with multiple CPUs and using Microsoft R only makes sense if your tasks can be parallelized. In all other cases, use standard R. Fortunately, there is a shortcut for this, which I will describe in the next section. Setting up Microsoft R is a little bit more complicated and will be tackled later.

If the job in question can be parallelized, it makes sense to utilize multiple cores. If not, multiple cores cannot be used.

Creating An Instance with Standard R

This is fairly easy because there is a website, which provides Amazon Machine Images (AMIs). Go to http://www.louisaslett.com/RStudio_AMI/, scroll down to “AMI Release History,” and in the topmost row of the table choose a link that belongs to a location near you. After that, select an operating system (I recommend Ubuntu) and choose an instance type (the c3 instances are optimized for computing, so I usually choose one of them). Before you launch the instance, make sure to edit the security group settings. The following inbound connections have to be enabled for RStudio Server to work properly: 22 (SSH), 80 (HTTP), 443 (HTTPS), and 8787 (RStudio Server). After you have edited the security options, click “Launch.” A popup window will prompt you to select or create a new private key. Follow the instructions and remember the location where you save it because you will need it later on if you want to transfer files to your instance or install software.

If you select the instance that you have just created in the Instances menu, you will see information in the bottom of the page. The most important information is the Public DNS. Paste the Public DNS in your browser and you should see an RStudio Login window. Log in with username rstudio and password rstudio.

This is basically everything you need to know to get started with R on Amazon AWS. If you want to learn how to transfer files to the AWS instance (which is useful for working with corpus data, for example) or how to install Microsoft R on AWS, keep reading.

Transferring Files to the AWS Instance

There are multiple ways to transfer files to an AWS instance. The easiest one is through RStudio itself—just use the Upload button, which you can find in the bottom-right quadrant under Files; this allows you to upload datasets or RData files, for example. However, you can only upload one file at a time, and if you are dealing with many files, this is rather impractical. You can also use ssh to transfer multiple files, but a more practical alternative is using a client like FileZilla. [3]

To connect to your Amazon AWS instance with FileZilla, you first have to load the private key that you have created before: go to the FileZilla settings and navigate to Connection –> FTP –> SFTP and add the key file. To connect to your instance, go to File –> Site Manager and use these information: Host: your Public DNS; Protocol: SFTP – SSH File Transfer Protocol; Logon Type: Interactive; User: ubuntu; you do not need to specify a password because you have already loaded the private key. After you are connected, you can use drag and drop to transfer your files.

Setting up Microsoft R

If you want to use Microsoft R instead, you cannot use the AMI shortcut from above. First, create a new instance by clicking the “Launch Instance” button in the Amazon menu, then select an operating system (I recommend Ubuntu), and choose an instance type. Remember to edit the security options as described above. Next, you have to connect to the instance via ssh to install Microsoft R and RStudio Server.

To connect to the the instance via ssh, type ssh -i "Key.pem" ubuntu@Public-DNS, in which Key.pem designates the location of your private key and Public-DNS is the public DNS address of the instance you have just created (remember, this address is available from the Instances menu within the Amazon interface).

Next, you want to install Microsoft R. Visit the Microsoft R download page [4] and copy the address of the latest Ubuntu version to your clipboard. Then navigate to the /home directory in your ssh session and type sudo wget "..." (substitue ... with the address in your clipboard). This will download the program. Then unpack the file by typing sudo tar -xf "..." (again, ... designates the file that you have just downloaded). Finally, navigate to the newly created folder and install the software by typing sudo ./install.sh.

After Microsoft R has been installed, you prepare the RStudio Server installation. In order for everything to work later on, you have to run a couple of commands in the terminal. First, create an account with which you can log into RStudio later. The following commands create a new user (name: rstudio), create a folder with the same name in the home directory, set a password for the user (also rstudio), and set permissions to execute files within that folder. Feel free to change the username and password if you wish.

sudo useradd rstudio
sudo mkdir /home/rstudio
sudo passwd rstudio
sudo chmod -R 0777 /home/rstudio

Next, the operating system should be updated. This has proven to prevent errors later on.

sudo apt-get update
sudo apt-get upgrade

Now we can finally install RStudio Server:

sudo apt-get install gdebi-core
sudo wget "https://download2.rstudio.org/rstudio-server-0.99.903-amd64.deb"
sudo gdebi rstudio-server-0.99.903-amd64.deb

I also recommend you install these compilers, which will enable you to install R packages:

sudo apt-get install g++
sudo apt-get install make
sudo apt-get install gfortran

Alright. If you have followed the above steps, your AWS instance with Microsoft R should be up and running. You can access RStudio by pasting the public DNS into your browser, but make sure to add :8787 to the address, which designates the port of RStudio. When you see the login dialog, enter the account details you have specified earlier (I set up user rstudio with password rstudio).


  1. This can be overcome to some extent by creating a new instance of RStudio. On a Mac, this can be done by typing open -n /Applications/RStudio.app in the terminal.
  2. An overview of the prices can be found on https://aws.amazon.com/ec2/pricing/on-demand/.
  3. FileZilla can be downloaded from https://filezilla-project.org/.
  4. https://mran.revolutionanalytics.com/download/.

Leave a Reply

Your email address will not be published. Required fields are marked *