Question: FeatureCounts fails to count fragments instead of reads
0
gravatar for Wietje
4 months ago by
Wietje150
Wietje150 wrote:

Hello fellow Biostars,

I recently stumbled across the phenomenon that FeatureCounts fails to count fragments instead of reads, despite the command option -p. This only seems to happen when the input BAM file starts with several multialigning reads and contains the tag NH:i:10. The problem then disappears when I remove reads from the first 50-100 lines, if they contain the beforementioned tag.

Do you know why FeatureCounts switches from fragment to the default read count if the beginning of a file has too many reads that align multiple times, e.g. 10x? Does the programme perform some sort of initial checkup?

(I assume that the library preparation had something to do with the abundancy of multialigning reads - I work with RNAseq data from a global RNA library preparation protocol (Illumina Ribo Zero Gold).)

Looking forward to your enlightening answers! :-)

ADD COMMENTlink modified 4 months ago by JJ320 • written 4 months ago by Wietje150

Are the mate reads of these multimappers still in the BAM file?

ADD REPLYlink written 4 months ago by ATpoint6.6k

Yes, I only filtered for "listed first in the file", something like this:

samtools view -h input.bam | awk -F"\t" '{if(NR>3319 && NR<3369 && $22!="NH:i:1" || NR>3319 && NR<3369 && $23!="NH:i:1");  else print $0}'|samtools view -bS - > new.bam

The BAM header length is 3319, that's where the record numbers in the AWK command come from; the NH-tag is usually listed in column 22 or 23. I basically checked between lines 3319 and 3369 and left the mates untouched.

ADD REPLYlink written 4 months ago by Wietje150
0
gravatar for JJ
4 months ago by
JJ320
JJ320 wrote:

The programme performs some sort of initial check:

featureCounts automatically detects if your bam files contain SE or PE reads. You can provide all your bam files to featureCounts in one go and set isPairedEnd=TRUE.

see here: https://support.bioconductor.org/p/107882/#108028

ADD COMMENTlink written 4 months ago by JJ320

Thank you for the link, that confirms part of my idea. I would however assume that featureCounts does only take the first X reads into account when concluding on SE or PE reads, right?

ADD REPLYlink written 4 months ago by Wietje150

I would assume that it does actually only check the first X reads and not all of them - would take to long probably - but I would assume more than just 100. But I don't really understand what multi alignment has to do with it. Are these first ones single reads?

ADD REPLYlink modified 4 months ago • written 4 months ago by JJ320

Not necessarily, there are also unpaired reads in the file, but the multialigning ones happen to be paired most of the time as far as I can see.

ADD REPLYlink written 4 months ago by Wietje150
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1511 users visited in the last hour