Question: do not sort option on featurecounts
0
gravatar for Adeler001
5 weeks ago by
Adeler0010
Adeler0010 wrote:

hello can someone please explain the −−donotsort option on featurecounts. Below is what the manual defines it as . but I still don't understand what it does exactly ?

−−donotsort : If specified, paired end reads will not be re-ordered even if reads from the same pair were found not to be next to each other in the input

rna-seq featurecount • 172 views
ADD COMMENTlink modified 5 weeks ago • written 5 weeks ago by Adeler0010

You should only use --donotsort option if your files are pre-sorted. Otherwise it will lead to erroneous results.

See this from Wei Shi (author of featureCounts).

We also added a new argument "--donotsort" to featureCounts to allow users to turn off the read sorting procedure. However, care must be taken for using this argument because the read counting result might be misleading if read pairs were not properly sorted.

ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by genomax76k

OK thank you AT point and genomax for your responses . I really appreciate it

ADD REPLYlink written 5 weeks ago by Adeler0010

If an answer/comment was helpful, you should upvote it; if the answer resolved your question, you should mark it as accepted. You can accept more than one if they work.
Upvote|Bookmark|Accept

ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by genomax76k
1
gravatar for ATpoint
5 weeks ago by
ATpoint28k
Germany
ATpoint28k wrote:

In a coordinate-sorted BAM file the paired-end reads (so the mates that come from the same sequenced fragment) are typically not adjacent to each other. Still, featureCounts requires them to be adjacent (so file being name-sorted) for proper paired-end quantification. Therefore, if you specify paired-end counting it will by default sort the BAM files prior to quantification. You can manually disable that with this option, but I see no reason to do that actually.

ADD COMMENTlink written 5 weeks ago by ATpoint28k

1) Hello ATpoint how would I specify *paired-end * quantification? 2) so what your saying is that −−donotsort disables the option to sort the BAM files prior to quantification.

ADD REPLYlink written 5 weeks ago by Adeler0010

-p count fragments. Paired-end reads represent two ends of a fragment.

If the files are not co-ordinate sorted it would likely take longer to do the counts. You could time it out with and without --dnonotsort.

ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by genomax76k

Hello ATpoint so if I use the −−donotsort option , it would just change the time to run the files? it wont change the read count numbers ?

ADD REPLYlink written 5 weeks ago by Adeler0010

Yes it probably changes counts because reads are not properly counted. featureCounts is blazingly fast, so better use it without experimental changes to get proper results. Let run overnight in the worst case ;-)

ADD REPLYlink written 5 weeks ago by ATpoint28k

thanks for answering my question , I used -p and −−donotsort together in the same command line. Would adding -p cancel out the do not sort? since you said that by adding paired-end counting (-p) it will by default sort the BAM files prior to quantification.

here's my script below

cd /hpf/projects/Dheon/sequencing_repository/raw_fastq/RNA_TOM/bam

/hpf/tools/centos6/subread/1.5.3/bin/featureCounts -a /hpf/projects/Dheon/sequencing_repository/raw_fastq/RNA_TOM/bam/gencode.v19.annotation.gtf -o family_13_01_RNA-seq.counts -g gene_name -p -s 2 -C --donotsort  D4775_R312_BAligned.sortedByCoord.out.bam D4754_R311_BAligned.sortedByCoord.out.bam D4776_R310Aligned.sortedByCoord.out.bam D4777_R307Aligned.sortedByCoord.out.bam D4778_R308BAligned.sortedByCoord.out.bam D4828_R309_BAligned.sortedByCoord.out.bam
ADD REPLYlink modified 5 weeks ago by ATpoint28k • written 5 weeks ago by Adeler0010

I did some quick testing with and without --donotsort but with -p in both cases and results are different. I assume without soring each read that has no mate right next to it is counted independently. I strongly suggest not to try out things here and simply use the default. The tool is really fast so be smart (and safe) and stick with the default sorting. By the way I removed the SLURM headers from your comment to shorten the post and improve readability (as they do not add to the problem here).

ADD REPLYlink written 5 weeks ago by ATpoint28k

Hello ATpoint and genomax thank you for your response, the reason I wanted to add the --donotsort option is that I was afraid that the feature counts program would not be able to properly account for paired reads that are not close in proximity due to being a part of a duplication or deletion or due to being chimeric reads.

ADD REPLYlink written 5 weeks ago by Adeler0010

It sorts by name, not coordinate so it does not matter where and how the reads map.

ADD REPLYlink written 5 weeks ago by ATpoint28k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1804 users visited in the last hour