Tophat - Htseq Count? How Can I Improve The Mapping Percentage For Single End Reads
0
0
Entering edit mode
10.4 years ago
k.nirmalraman ★ 1.1k

Dear All,

Probably this is a novice question.

I have single end reads of size 100bp from Illumina TruSeq sequencing. I am using mouse genome build mm9 from TopHat Index and annotation downloads.

The library is 16 Million reads

My tophat command is as follows:

tophat -p 4 -N 3 --read-gap-length 3 --read-edit-dist 3 --output-dir <path> <genome_path> path/to/input.fasta

HTSeq Count

python -m HTSeq.scripts.count -m intersection-nonempty -s no -i gene_id -t exon accepted_hits.sam /mm9/genes.gtf > counts.txt

After HTSeq Count

 no_feature          4973501
 ambiguous           125622
 too_low_aQual       0
 not_aligned         0
 alignment_not_unique        5620063

The HTSeq output has the above statistics... Is it normal to have such kind of numbers for no_feature and alignment_not_unique for single end sequencing. Is there something that can be done to improve this statistics.

Well as an extension to the above question, what happens if the features that are not unique if counted for both genes in Differential expression analysis? Does this cause any bias?

Thanks in advance!

htseq tophat rnaseq • 6.4k views
ADD COMMENT
1
Entering edit mode

That's a pretty high level of no_feature for the mouse genome. Did you purify for anything unusual at some point? Having around 10% no_feature isn't unheard of, but over 25% is kind of over the top. You might want to look where some of those no_feature reads are aligning. Perhaps you have a bunch of DNA contamination or just really high amounts of pre-mRNAs?

ADD REPLY
0
Entering edit mode

Just to agree with @dpryan, take a look at your data in a browser. You'll likely learn a lot, particularly if you are relatively new to these data.

ADD REPLY
0
Entering edit mode

I would definitely like to look into these reads in a genome browser. Thanks for that direction! :)

ADD REPLY
0
Entering edit mode

There was no unusual purification method used.

ADD REPLY
1
Entering edit mode

Maybe your question should be: "Can I improve ..." in stead of "How can I improve". Maybe doing single-end sequencing doesn't fit your study/sample very well.

ADD REPLY
0
Entering edit mode

What's the experiment? Is it RNA-seq? What RNA sizes did you select for?

ADD REPLY
0
Entering edit mode

It is RNA-Seq! and the Size selected is 300 bp!

ADD REPLY

Login before adding your answer.

Traffic: 2139 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6