Losing half of read counts

1

Entering edit mode

7.5 years ago

gudraephouto ▴ 10

After counting reads, among 20 million reads for each sample, about only 10 million of reads are counted. Is it logical? This is my pipeline:

* 1-Fastq to bam (HiSat, hg38):

20027937 reads; of these:

20027937 (100.00%) were unpaired; of these:

3671767 (18.33%) aligned 0 times

11149914 (55.67%) aligned exactly 1 time

5206256 (25.99%) aligned >1 times

81.67% overall alignment rate

[bam_sort_core] merging from 12

* 2-Bam to read counts (Htseq, ensemble V25)

__no_feature    1817050

__ambiguous 1350078

__too_low_aQual 1378909

__not_aligned   3671767

__alignment_not_unique  3308485

read counts • 1.6k views

ADD COMMENT • link updated 7.5 years ago by GenoMax 141k • written 7.5 years ago by gudraephouto ▴ 10

1

Entering edit mode

I'm confused, all your reads are accounted for no?

3671767 + 11149914 + 5206256 = 20027937

Do you mean why do only 11149914 (55%) align exactly 1 time?

ADD REPLY • link 7.5 years ago by John 13k

0

Entering edit mode

Not accounted but counted as in HTseq count?

HTseq-count result = 11526289

ADD REPLY • link 7.5 years ago by GenoMax 141k

0

Entering edit mode

No htseq count result is 10522274

ADD REPLY • link 7.5 years ago by gudraephouto ▴ 10

0

Entering edit mode

No, I mean why htseq could count only half of my reads: total number of read counts=10522274

ADD REPLY • link 7.5 years ago by gudraephouto ▴ 10

1

Entering edit mode

There is no an "assigned" category in your HTseqCount output (no_feature, ambiguous, too_low_aQual, not_aligned alignment_not_unique).

So the difference is made of the reads that are actually assigned to genes.

ADD REPLY • link 7.5 years ago by Carlo Yague 8.6k

0

Entering edit mode

Exactly... My question is if it is alright that only 10522274 of 20027937 are assigned to genes. Isn't that low for further analysis?

ADD REPLY • link 7.5 years ago by gudraephouto ▴ 10

3

Entering edit mode

Not necessarily. Since you have a number of reads (25%) that are multimapping they are not being counted.

ADD REPLY • link 7.5 years ago by GenoMax 141k

Login before adding your answer.