featurecounts vs samtools view
Entering edit mode
5.5 years ago
tonja.r ▴ 490

I needed to check the results coming from featureCounts from subread. In fact, samtools outputted less reads then featurecounts did.

I run samtools as follows:

samtools view -c -q 1 -F 4 -L matrix_gene.bed file.bam

And featureCounts:

featureCounts --minReadOverlap 50 -s 1 -Q 1 -T 12 -a matrix_gene.saf -F SAF -O -o $OUT/total_counts.txt file.bam
featureCounts --minReadOverlap 50 -s 2 -Q 1 -T 12 -a matrix_gene.saf -F SAF -O -o $OUT/total_counts.txt file.bam

(minReadOverlap = 50 because the length of my reads is 50, -s 1 for stranded and -s 2 for reversely stranded, -Q mapping quality and -O for multiple overlap (as samtools outputs alignment multiple times if they overlap multiple regions)

samtools results:

featureCounts -s 1: 1661976
featureCounts -s 2: 1662917
So, the  number of stranded and reverse stranded reads: 3324893

Why do the results differ if I specified the same parameters in both methods?

PS: I did not find what overlap between a read and a region samtools allows, but if the overlap less than the read length, samtools should have more reads than featureCounts.



software error • 2.0k views
Entering edit mode
5.5 years ago

The answer to this type of questions usually very simple - the two tools actually do different things. So the discrepancy is unlikely to be caused by software errors.

Of course why do they do different things - now that is harder to answer. Make sure to investigate the preconceptions and tacit assumptions that you have about the parameters and what the tools actually they do.

For example a read may overlap with features on both positive and negative strands - in which case it will be double counted. Etc.


Login before adding your answer.

Traffic: 1831 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6