Question

Using rna-STAR on Amazon EC2 instance

0

Entering edit mode

7.2 years ago

marypthomas88 • 0

This question is for anyone on this who has experience running STAR on the EC2 instances.

We are using STAR (2.5.2a_modified) to align reads from a targeted library against a small pseudo-genome of 20,000 50 mers, on an Amazon EC2 instance (i3.8xlarge, optimized for IO, 32 Virtual Cores -AWS uses hyperthreading- and 244 GB RAM. The analyzer results are validated, and our next step is to optimize the application by reducing run-time (increasing Nthreads) or using incorporating the shared memory feature.

Does anyone understand how the EC2 hyperthreading impacts the STAR shared memory model? Would we gain any advantage using the genomeLoad options, which requires that we modify the system shared memory settings? On the EC2, does changing the shared memory blocks (in /etc/sysctl.conf) impact the system or have any effect on STAR?

Thanks in advance.

sequencing rna-seq EC2 rna-STAR • 3.0k views

ADD COMMENT • link 7.2 years ago by marypthomas88 • 0

1

Entering edit mode

From what I know, the shared memory feature is used so multiple separate STAR jobs can access the same loaded genome index. So if you have relatively small amount of memory and you want to run multiple jobs, this will reduce your memory footprint.

How many separate STAR jobs are you running? Your pseudo-genome looks small enough where you can afford to just load a separate index into memory with every job. Especially since you have 244gb.

ADD REPLY • link 7.2 years ago by Damian Kao 16k

0

Entering edit mode

Thanks for the feedback. Yes, our pseudo-genome seems small and we assumed we could load this in with no problem but on our previous machine (16 cores) we were seeing memory segmentation faults, and we are trying to understand this. While the instance has 244GB of RAM, I am not sure it is available to all cores -- we also have many types of jobs running on the instance. One of my basic questions is how the STAR OpenMP shared memory model would run on the EC2, which uses hyperthreading. If I ask for 8 cores, I actually get 4 physical cores, each running two hyperthreads. I am wondering if perhaps I need to recompile STAR on the instance?

Our data runs can have as many as 400 fastq files, and right now we run two STAR apps on 7 cores each, for total of 14 cores). We just moved to a new instance with 32 virtual cores, so we are hoping to see better performance.

ADD REPLY • link 7.2 years ago by marypthomas88 • 0