1
0
Entering edit mode
3.6 years ago
Wietje ▴ 220

Hello fellow Biostars,

I recently stumbled across the phenomenon that FeatureCounts fails to count fragments instead of reads, despite the command option -p. This only seems to happen when the input BAM file starts with several multialigning reads and contains the tag NH:i:10. The problem then disappears when I remove reads from the first 50-100 lines, if they contain the beforementioned tag.

Do you know why FeatureCounts switches from fragment to the default read count if the beginning of a file has too many reads that align multiple times, e.g. 10x? Does the programme perform some sort of initial checkup?

(I assume that the library preparation had something to do with the abundancy of multialigning reads - I work with RNAseq data from a global RNA library preparation protocol (Illumina Ribo Zero Gold).)

RNA-Seq FeatureCounts Fragment Counting reads • 1.6k views
0
Entering edit mode

Are the mate reads of these multimappers still in the BAM file?

0
Entering edit mode

Yes, I only filtered for "listed first in the file", something like this:

samtools view -h input.bam | awk -F"\t" '{if(NR>3319 && NR<3369 && $22!="NH:i:1" || NR>3319 && NR<3369 &&$23!="NH:i:1");  else print \$0}'|samtools view -bS - > new.bam


The BAM header length is 3319, that's where the record numbers in the AWK command come from; the NH-tag is usually listed in column 22 or 23. I basically checked between lines 3319 and 3369 and left the mates untouched.

0
Entering edit mode
3.6 years ago
JJ ▴ 590

The programme performs some sort of initial check:

featureCounts automatically detects if your bam files contain SE or PE reads. You can provide all your bam files to featureCounts in one go and set isPairedEnd=TRUE.

0
Entering edit mode

Thank you for the link, that confirms part of my idea. I would however assume that featureCounts does only take the first X reads into account when concluding on SE or PE reads, right?

0
Entering edit mode

I would assume that it does actually only check the first X reads and not all of them - would take to long probably - but I would assume more than just 100. But I don't really understand what multi alignment has to do with it. Are these first ones single reads?

0
Entering edit mode

Not necessarily, there are also unpaired reads in the file, but the multialigning ones happen to be paired most of the time as far as I can see.