Question: Counting Repeat And Unique Reads Of Tophat Output
2
gravatar for Stevelor
8.2 years ago by
Stevelor310
Stevelor310 wrote:

Hey,

I used Tophat for paired-end RNA-Seq mapping and converted the "accepted_hits.bam" to a *.bed file with 82859900 entries/lines -> hits on the reference genome. I wanted to know how much unique and repeat reads i've got...also on how many locations on the reference genome the repeats reads hit.

So i wrote some lines of code comparing and counting the unique read IDs with following result:

hits: 82859900
unique hits: 75600252
repeat hits: 3217634 hit on 7259648 locations

Looks good!!! But is there another way to get these counts out of the tophat log-files? What are they for, cause they give me strange counts^^
Or is this the only way to get this information??
How do you count these reads??
I am not happy with samtools flagstat and picardtools :(

Cheers, Steve

read tophat rna parsing • 5.8k views
ADD COMMENTlink modified 8.2 years ago by Gww2.7k • written 8.2 years ago by Stevelor310
1

it would be nice if you can share your lines. I would really like to know how to do something like that.

ADD REPLYlink written 8.0 years ago by Assa Yeroslaviz1.2k
7
gravatar for Gww
8.2 years ago by
Gww2.7k
Canada
Gww2.7k wrote:

In the bam file created by TopHat there is an auxiliary tag (NH) that specifies the number of hits each read has. For example, NH:i:2 says that there are two hits for that read.

ADD COMMENTlink written 8.2 years ago by Gww2.7k

do you know what the NM and XS specify?

ADD REPLYlink written 8.1 years ago by Holly30

NM is the number of mismatches in the read. XS: Is the eXpected Strand of the transcript based on transcript annotations and / or splice site motifs ie. GT:AG or AT:AC.

ADD REPLYlink written 8.1 years ago by Gww2.7k

NM is the number of mismatches in the read. XS: Is the eXpected Strand of the read based on transcript annotations and / or splice site motifs ie. GT:AG or AT:AC

ADD REPLYlink written 8.1 years ago by Gww2.7k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1360 users visited in the last hour