Question

featurecounts more fragments than input read pairs

1

Entering edit mode

5.7 years ago

ewre ▴ 250

Hi All, I have a question for the output of featurecounts (from subread package). The total number of my input read pairs is 47168870 (reported by fastQC and STAR):

>  Number of input reads |  47168870
>                       Average input read length | 152
>                                     UNIQUE READS:
>                    Uniquely mapped reads number | 37604677
>                         Uniquely mapped reads % | 79.72%
>                           Average mapped length | 149.39
>                        Number of splices: Total | 16519867
>             Number of splices: Annotated (sjdb) | 0

But the total number of fragments reported by featurecount is 82845035, almost twice as much as the number of input read pairs. the number of SAM alignment pairs reported by htseqcount is 81037190.

> (featurecount): Total fragments : 82845035                            
>   Successfully assigned fragments : 32386307 (39.1%)

(htseqcount)

> 81000000 SAM alignment record pairs processed. Warning: Mate pairing
> was ambiguous for 965089 records; mate key for first such record: 
> 81037190 SAM alignment pairs processed.

This is the gff I used for count is gencode.v24.annotation.gff3

My question is that I want to know what is the definition of fragment in featurecount report? why there is more fragments compared with the input read pairs? In my understanding, each read pair indicates a fragment and the total number of fragment and total number of read pair should be equal.

Thank you for your time in advance.

ewre

featurecounts RNA-Seq pair-end htseq-count • 2.9k views

ADD COMMENT • link updated 5.6 years ago by Biostar 20 • written 5.7 years ago by ewre ▴ 250

1

Entering edit mode

Did you use the -p option to count fragments instead of reads?

ADD REPLY • link 5.7 years ago by GenoMax 141k

0

Entering edit mode

Yes: featureCounts -p -s 2 -T 5 -a gencode.v24.annotation.gff3 -t exon -g gene_id -o sample.out sample.bam

ADD REPLY • link 5.7 years ago by ewre ▴ 250

0

Entering edit mode

Ewre, I have the same situation as you do. I wonder if you find out the answers to your questions, can you kindly share the answer?

ADD REPLY • link 5.6 years ago by cyd • 0

0

Entering edit mode

If you isolate the read names of all the reads that have mapped, and then sort | uniq them, how many do you get? Chances that a multi-positional alignment is happening?

ADD REPLY • link 5.6 years ago by Matteo Schiavinato ★ 3.6k