Question: Tophat-Fusion-Post On A Non-Well Annotated Genome
1
gravatar for Nicolas Rosewick
6.5 years ago by
Belgium, Brussels
Nicolas Rosewick7.4k wrote:

Hello,

I'm currently using tophat 2.04 with --fusion-search to discover fusion transcripts in a non-well annotated genome. I created my own annotation with the gene sequences of a related species (which is annotated). So I have the genome fasta file and a gff for the annotation. I ran tophat 2.0.4 on all my samples and I now want to execute tophat-fusion-post. I read tophat's manual ( here ) but I have difficulties understanding how to execute tophat-fusion-post on non-human samples. What about refGene.txt and ensGene.txt? Do I have to create them starting from the annotation? Same question for the blast databases.

Thanks a lot for your help,

N.

tophat • 2.3k views
ADD COMMENTlink modified 6.5 years ago • written 6.5 years ago by Nicolas Rosewick7.4k
2
gravatar for Istvan Albert
6.5 years ago by
Istvan Albert ♦♦ 79k
University Park, USA
Istvan Albert ♦♦ 79k wrote:

Well I will just say that the "manual" that they have is woefully inadequate. Plus a disclaimer: I have not used the tool but I just read the paper - your link got me interested in it. So the below is my opinion mainly based on the paper:

The refGene.txt and ensGene.txt files will need to be present and you will need to generate them from your annotations. I think they can contain the same data, originally the tool was designed to combine annotation form RefSeq and Ensembl, so each file is supposed to contain annotations from that resource.

Blast is used to filter out false fusions based on sequence similarly, you will need to download the existing blast database from the links in the manual but then also index your genome if it is not already contained in the nt database.

ADD COMMENTlink modified 6.5 years ago by Obi Griffith17k • written 6.5 years ago by Istvan Albert ♦♦ 79k

Ok thanks ! So in summary : - refGene.txt and ensGene.txt generated from my annotation - blast db of the genome.

and that's it ?

ADD REPLYlink written 6.5 years ago by Nicolas Rosewick7.4k

well as always the devil is in the details - see what happens

ADD REPLYlink written 6.5 years ago by Istvan Albert ♦♦ 79k
0
gravatar for Nicolas Rosewick
6.5 years ago by
Belgium, Brussels
Nicolas Rosewick7.4k wrote:

I've an additional little question about ensGene.txt and refGene.txt. What is the format of these files ?

here's a line from refGene.txt (ensGene.txt is in the same format). For each column, I pass a line.

271
NM_001080475
chr2
-
208394256
208598529
208401287
208574608
8
208394256,208434073,208481482,208503894,208519335,208549619,208573998,208598357,
208401465,208434231,208481546,208504088,208519481,208550555,208574926,208598529,
0
PLEKHM3
cmpl
cmpl
2,0,2,0,1,1,0,-1,

Thanks

ADD COMMENTlink modified 6.5 years ago • written 6.5 years ago by Nicolas Rosewick7.4k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1148 users visited in the last hour