Question: Tophat - Htseq Count? How Can I Improve The Mapping Percentage For Single End Reads
0
gravatar for k.nirmalraman
5.6 years ago by
k.nirmalraman980
Germany
k.nirmalraman980 wrote:

Dear All,

Probably this is a novice question.

I have single end reads of size 100bp from Illumina TruSeq sequencing. I am using mouse genome build mm9 from TopHat Index and annotation downloads.

The library is 16 Million reads

My tophat command is as follows:

tophat -p 4 -N 3 --read-gap-length 3 --read-edit-dist 3 --output-dir <path> <genome_path> path/to/input.fasta

HTSeq Count

python -m HTSeq.scripts.count -m intersection-nonempty -s no -i gene_id -t exon accepted_hits.sam /mm9/genes.gtf > counts.txt

After HTSeq Count

 no_feature          4973501
 ambiguous           125622
 too_low_aQual       0
 not_aligned         0
 alignment_not_unique        5620063

The HTSeq output has the above statistics... Is it normal to have such kind of numbers for no_feature and alignment_not_unique for single end sequencing. Is there something that can be done to improve this statistics.

Well as an extension to the above question, what happens if the features that are not unique if counted for both genes in Differential expression analysis? Does this cause any bias?

Thanks in advance!

htseq rnaseq tophat • 4.5k views
ADD COMMENTlink written 5.6 years ago by k.nirmalraman980
1

That's a pretty high level of no_feature for the mouse genome. Did you purify for anything unusual at some point? Having around 10% no_feature isn't unheard of, but over 25% is kind of over the top. You might want to look where some of those no_feature reads are aligning. Perhaps you have a bunch of DNA contamination or just really high amounts of pre-mRNAs?

ADD REPLYlink written 5.6 years ago by Devon Ryan90k

Just to agree with @dpryan, take a look at your data in a browser. You'll likely learn a lot, particularly if you are relatively new to these data.

ADD REPLYlink written 5.6 years ago by Sean Davis25k

I would definitely like to look into these reads in a genome browser. Thanks for that direction! :)

ADD REPLYlink written 5.6 years ago by k.nirmalraman980

There was no unusual purification method used.

ADD REPLYlink written 5.6 years ago by k.nirmalraman980
1

Maybe your question should be: "Can I improve ..." in stead of "How can I improve". Maybe doing single-end sequencing doesn't fit your study/sample very well.

ADD REPLYlink written 5.6 years ago by Irsan6.9k

What's the experiment? Is it RNA-seq? What RNA sizes did you select for?

ADD REPLYlink written 5.6 years ago by Jelena Aleksic900

It is RNA-Seq! and the Size selected is 300 bp!

ADD REPLYlink written 5.6 years ago by k.nirmalraman980
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1417 users visited in the last hour