Question

do not sort option on featurecounts

0

Entering edit mode

5.6 years ago

Adeler001 • 0

hello can someone please explain the −−donotsort option on featurecounts. Below is what the manual defines it as . but I still don't understand what it does exactly ?

−−donotsort : If specified, paired end reads will not be re-ordered even if reads from the same pair were found not to be next to each other in the input

featurecount RNA-Seq • 3.0k views

ADD COMMENT • link 5.6 years ago by Adeler001 • 0

1

Entering edit mode

You should only use --donotsort option if your files are pre-sorted. Otherwise it will lead to erroneous results.

See this from Wei Shi (author of featureCounts).

We also added a new argument "--donotsort" to featureCounts to allow users to turn off the read sorting procedure. However, care must be taken for using this argument because the read counting result might be misleading if read pairs were not properly sorted.

ADD REPLY • link 5.6 years ago by GenoMax 152k

0

Entering edit mode

OK thank you AT point and genomax for your responses . I really appreciate it

ADD REPLY • link 5.6 years ago by Adeler001 • 0

ATpoint · Accepted Answer · 2019-12-13

2

Entering edit mode

5.6 years ago

ATpoint 88k

In a coordinate-sorted BAM file the paired-end reads (so the mates that come from the same sequenced fragment) are typically not adjacent to each other. Still, featureCounts requires them to be adjacent (so file being name-sorted) for proper paired-end quantification. Therefore, if you specify paired-end counting it will by default sort the BAM files prior to quantification. You can manually disable that with this option, but I see no reason to do that actually.

ADD COMMENT • link 5.6 years ago by ATpoint 88k

0

Entering edit mode

1) Hello ATpoint how would I specify *paired-end * quantification? 2) so what your saying is that −−donotsort disables the option to sort the BAM files prior to quantification.

ADD REPLY • link 5.6 years ago by Adeler001 • 0

0

Entering edit mode

-p count fragments. Paired-end reads represent two ends of a fragment.

~~If the files are not co-ordinate sorted it would likely take longer to do the counts. You could time it out with and without --dnonotsort.~~

ADD REPLY • link 5.6 years ago by GenoMax 152k

0

Entering edit mode

Hello ATpoint so if I use the −−donotsort option , it would just change the time to run the files? it wont change the read count numbers ?

ADD REPLY • link 5.6 years ago by Adeler001 • 0

0

Entering edit mode

Yes it probably changes counts because reads are not properly counted. featureCounts is blazingly fast, so better use it without experimental changes to get proper results. Let run overnight in the worst case ;-)

ADD REPLY • link 5.6 years ago by ATpoint 88k

0

Entering edit mode

thanks for answering my question , I used -p and −−donotsort together in the same command line. Would adding -p cancel out the do not sort? since you said that by adding paired-end counting (-p) it will by default sort the BAM files prior to quantification.

here's my script below

cd /hpf/projects/Dheon/sequencing_repository/raw_fastq/RNA_TOM/bam

/hpf/tools/centos6/subread/1.5.3/bin/featureCounts -a /hpf/projects/Dheon/sequencing_repository/raw_fastq/RNA_TOM/bam/gencode.v19.annotation.gtf -o family_13_01_RNA-seq.counts -g gene_name -p -s 2 -C --donotsort  D4775_R312_BAligned.sortedByCoord.out.bam D4754_R311_BAligned.sortedByCoord.out.bam D4776_R310Aligned.sortedByCoord.out.bam D4777_R307Aligned.sortedByCoord.out.bam D4778_R308BAligned.sortedByCoord.out.bam D4828_R309_BAligned.sortedByCoord.out.bam

ADD REPLY • link updated 5.6 years ago by ATpoint 88k • written 5.6 years ago by Adeler001 • 0

0

Entering edit mode

I did some quick testing with and without --donotsort but with -p in both cases and results are different. I assume without soring each read that has no mate right next to it is counted independently. I strongly suggest not to try out things here and simply use the default. The tool is really fast so be smart (and safe) and stick with the default sorting. By the way I removed the SLURM headers from your comment to shorten the post and improve readability (as they do not add to the problem here).

ADD REPLY • link 5.6 years ago by ATpoint 88k

0

Entering edit mode

Hello ATpoint and genomax thank you for your response, the reason I wanted to add the --donotsort option is that I was afraid that the feature counts program would not be able to properly account for paired reads that are not close in proximity due to being a part of a duplication or deletion or due to being chimeric reads.

ADD REPLY • link 5.6 years ago by Adeler001 • 0

0

Entering edit mode

It sorts by name, not coordinate so it does not matter where and how the reads map.

ADD REPLY • link 5.6 years ago by ATpoint 88k