Question

FeatureCounts fails to count fragments instead of reads

0

Entering edit mode

6.0 years ago

Wietje ▴ 240

Hello fellow Biostars,

I recently stumbled across the phenomenon that FeatureCounts fails to count fragments instead of reads, despite the command option -p. This only seems to happen when the input BAM file starts with several multialigning reads and contains the tag NH:i:10. The problem then disappears when I remove reads from the first 50-100 lines, if they contain the beforementioned tag.

Do you know why FeatureCounts switches from fragment to the default read count if the beginning of a file has too many reads that align multiple times, e.g. 10x? Does the programme perform some sort of initial checkup?

(I assume that the library preparation had something to do with the abundancy of multialigning reads - I work with RNAseq data from a global RNA library preparation protocol (Illumina Ribo Zero Gold).)

Looking forward to your enlightening answers! :-)

RNA-Seq FeatureCounts Fragment Counting reads • 2.5k views

ADD COMMENT • link updated 6.0 years ago by JJ ▴ 670 • written 6.0 years ago by Wietje ▴ 240

0

Entering edit mode

Are the mate reads of these multimappers still in the BAM file?

ADD REPLY • link 6.0 years ago by ATpoint 81k

0

Entering edit mode

Yes, I only filtered for "listed first in the file", something like this:

samtools view -h input.bam | awk -F"\t" '{if(NR>3319 && NR<3369 && $22!="NH:i:1" || NR>3319 && NR<3369 && $23!="NH:i:1");  else print $0}'|samtools view -bS - > new.bam

The BAM header length is 3319, that's where the record numbers in the AWK command come from; the NH-tag is usually listed in column 22 or 23. I basically checked between lines 3319 and 3369 and left the mates untouched.

ADD REPLY • link 6.0 years ago by Wietje ▴ 240

score 0 · Answer 1 · 2018-04-19

0

Entering edit mode

6.0 years ago

JJ ▴ 670

The programme performs some sort of initial check:

featureCounts automatically detects if your bam files contain SE or PE reads. You can provide all your bam files to featureCounts in one go and set isPairedEnd=TRUE.

see here: https://support.bioconductor.org/p/107882/#108028

ADD COMMENT • link 6.0 years ago by JJ ▴ 670

0

Entering edit mode

Thank you for the link, that confirms part of my idea. I would however assume that featureCounts does only take the first X reads into account when concluding on SE or PE reads, right?

ADD REPLY • link 6.0 years ago by Wietje ▴ 240

0

Entering edit mode

I would assume that it does actually only check the first X reads and not all of them - would take to long probably - but I would assume more than just 100. But I don't really understand what multi alignment has to do with it. Are these first ones single reads?

ADD REPLY • link 6.0 years ago by JJ ▴ 670

0

Entering edit mode

Not necessarily, there are also unpaired reads in the file, but the multialigning ones happen to be paired most of the time as far as I can see.

ADD REPLY • link 6.0 years ago by Wietje ▴ 240