Losing half of read counts
0
1
Entering edit mode
7.5 years ago
gudraephouto ▴ 10

After counting reads, among 20 million reads for each sample, about only 10 million of reads are counted. Is it logical? This is my pipeline:

* 1-Fastq to bam (HiSat, hg38):

20027937 reads; of these:

20027937 (100.00%) were unpaired; of these:

3671767 (18.33%) aligned 0 times

11149914 (55.67%) aligned exactly 1 time

5206256 (25.99%) aligned >1 times

81.67% overall alignment rate

[bam_sort_core] merging from 12

* 2-Bam to read counts (Htseq, ensemble V25)

__no_feature    1817050

__ambiguous 1350078

__too_low_aQual 1378909

__not_aligned   3671767

__alignment_not_unique  3308485
read counts • 1.6k views
ADD COMMENT
1
Entering edit mode

I'm confused, all your reads are accounted for no?

3671767 + 11149914 + 5206256 = 20027937

Do you mean why do only 11149914 (55%) align exactly 1 time?

ADD REPLY
0
Entering edit mode

Not accounted but counted as in HTseq count?

HTseq-count result = 11526289

ADD REPLY
0
Entering edit mode

No htseq count result is 10522274

ADD REPLY
0
Entering edit mode

No, I mean why htseq could count only half of my reads: total number of read counts=10522274

ADD REPLY
1
Entering edit mode

There is no an "assigned" category in your HTseqCount output (no_feature, ambiguous, too_low_aQual, not_aligned alignment_not_unique).

So the difference is made of the reads that are actually assigned to genes.

ADD REPLY
0
Entering edit mode

Exactly... My question is if it is alright that only 10522274 of 20027937 are assigned to genes. Isn't that low for further analysis?

ADD REPLY
3
Entering edit mode

Not necessarily. Since you have a number of reads (25%) that are multimapping they are not being counted.

ADD REPLY

Login before adding your answer.

Traffic: 1819 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6