Getting started with RStudio on Amazon SageMaker
Many R users are introduced to RStudio through the open-source RStudio Desktop IDE. This desktop program provides a gateway into the powerful world of R programming. With RStudio Desktop, you can gain first-hand experience using R to perform common data science and statistical tasks like visualizing and modeling data.
At some point, however, it’s inevitable that you will encounter an analysis or data set that exceeds the capabilities of your laptop or desktop. In these scenarios, the easiest option is often to perform the given analysis in a larger and more capable environment, like a more powerful desktop computer or even a Linux server. However, buying a larger local workstation is often expensive and impractical, and provisioning an entire server environment from scratch can be complex and time-consuming.
One simple option for scaling up is RStudio on Amazon SageMaker, which enables R users to start coding in AWS with a few clicks, without needing to configure a server environment. This post will walk through how to get started quickly with RStudio on Amazon SageMaker. And if you want help, don’t hesitate to book a call with our team.
Obtain a license
Because RStudio on Amazon SageMaker uses Posit Workbench, it requires a professional license. There are many ways to obtain a license, but if you are looking to experiment with this solution, the easiest way is through the AWS Marketplace. The listing Posit Platform (RStudio on SageMaker) supports a free trial that can be used to get started.
Once the contract is created, Posit will grant a license to your account in the AWS License Manager.
Setup a domain
With a license in place, you can now create a SageMaker domain with access to RStudio. Setting up a SageMaker domain with access to RStudio requires a few specific steps that are outlined here. Once the domain is set up, the user you created will have access to launch RStudio.
Your first session
When you start RStudio from SageMaker, you’ll have the ability to make two significant choices: the compute instance type that runs the session and the Docker image that session is based on.
Choosing the instance type allows you to select a compute instance with the CPU and RAM that best meet the requirements of your expected workload. For example, an ml.t3.medium offers 2 CPUs and 4 GB of RAM, which wouldn’t offer an advantage over any sort of modern laptop. However, an ml.r5.24xlarge offers 96 CPUs and 768 GB of RAM, which is a significant upgrade over anything you may have access to locally. This ability to choose the instance type on a per-session basis allows you to select an environment that can meet the needs of a given project. Keep in mind that each of these instances incurs costs during the duration of the session. Details about the different instance types available for RStudio sessions on SageMaker can be found here.
SageMaker provides a default session image that is sufficient for the purposes of this post. SageMaker supports custom Docker images that can provide additional flexibility for the compute environment that sessions run on. If you’re interested in creating your own Docker image for use within SageMaker, you can find details here.
If you’re coming from a desktop installation of RStudio, you’ll likely have local files that you want to be able to access from your SageMaker environment. There are a couple of approaches that can be used.
If you’re familiar with a version control tool like git, you can use that to copy files into your SageMaker environment. One simple way to do this is to create a new RStudio project from Version Control.
This will clone a remote repository and provide access to all files you have stored there.
The other option is to directly upload files from your local workstation via the RStudio UI.
Because of how SageMaker operates, either of these approaches will result in your files being accessible from not only the current session, but any subsequent sessions you may start on SageMaker.
Now that you have access to the files you need, you can begin to take advantage of the increased compute and memory capabilities of SageMaker sessions. You’ll find that workloads that pushed the limits of your local workstation are completed with ease in the flexible sessions provided by SageMaker.
Learn more
If you’re interested in learning more about RStudio on Amazon SageMaker, save the date on June 6, 2023 for a live demonstration and Q&A about the integration. If you’re interested in an enterprise license for use of RStudio on SageMaker within your organization, please reach out to Posit Sales.