Question: What is the best practice for BAM sorting?
1
gravatar for Kirill
3.9 years ago by
Kirill260
Australia
Kirill260 wrote:

Hi guys,

I am aligning RNA-seq data using STAR and will need in later steps to use sorted BAM files. I was wondering what people suggest to use for BAM sorting? The options that I have considerer/used include:

  • STAR --outSAMtype BAM SortedByCoordinate - but this crashes because of memory (we only have 128GB of RAM our server). That is when I'm merging lanes and have paired-end reads using STAR

  • e.g STAR --ReadFilesIn read1_lane1_r1,fq,read1_lane2_r1 read1_lane1_r2,read1_lane2_r2 This is fixable by limiting RAM STAR --limitBAMsortRAM

OR

  • simply output unsorted BAM files from STAR (which I am doing now) and use samtools sort -b -o outSorted.bam inFromSTAR.bam

OR 

  • Are there any other suggestions for the best practice to get sorted BAM files?

Are there performance differences between STAR and samtools for example? If anyone knows. And Is sorting algorithms the same similar between different tools?

Cheers, 

rna-seq star samtools forum • 3.8k views
ADD COMMENTlink modified 3.9 years ago by Sean Davis25k • written 3.9 years ago by Kirill260
1
gravatar for poisonAlien
3.9 years ago by
poisonAlien2.7k
Asgard
poisonAlien2.7k wrote:

Sorting tool is your choice of preference. Either STAR, samtools sort, picrad sortSam or sambamba-sort they all do the same thing. (samtools sort and sambamba are multithreaded and works much faster. Plus sambamba indexing works at lightening speed). But unsorted bam file works much faster when you are using count tools such as featureCounts. (It takes like ~5 mins for assigning 60 million reads to genes for unsorted bam file against 60 mins for position sorted, of which most of the time is spent on format conversion).

May be you do second option, output as unsorted bam files, get counts using featreCounts for reference gtf (if this is what you are upto) and sort it using any of the above tool and remove unsorted one if you want to save the disk space. 

ADD COMMENTlink modified 3.9 years ago • written 3.9 years ago by poisonAlien2.7k

Great, thanks for answer. I know that featureCounts is faster than htseq-count. Would you say the same for htseq-count that it works faster on the unsorted bam? thanks

ADD REPLYlink written 3.9 years ago by Kirill260
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1955 users visited in the last hour