Question: tophat2 cannot find transcript file
0
gravatar for ag1194
3.3 years ago by
ag11940
ag11940 wrote:

Hi, I am new at using tophat2, in order to do that I am using a paper as a reference. In the paper they give the options they used [-r 25 --coverage-search -G --library-type fr-firststrand ], but I am getting an error. So this is the command line I use:

$ tophat -r 25 --coverage-search -G --library-type fr-firststrand /my_index directory/bowtie2/mm9 mysample.fastq &> tophat.log

In my log file it says:

Error: cannot find transcript file --library-type

I assume the error could be due to using -G but not providing an annotation, however in the paper they didn't provide anything other than they used mm9 genome for mapping. I have read tophat manual but couldn't figure out the reason of my error. Can anybody help me on this? Thanks!!

rna-seq • 1.6k views
ADD COMMENTlink modified 3.3 years ago by WouterDeCoster40k • written 3.3 years ago by ag11940
2

The -G option is used to created a "transcriptome" specific index from a whole genome index by providing a GTF/GFF file like this. This is a one time run. It allows one to re-use this index for subsequent runs for all samples for aligning to just that part of the genome.

tophat -G known_genes.gtf \
    --transcriptome-index=transcriptome_data/known \
    hg19

When you actually use this index you need to provide the location for it by using

tophat -o out_sample2 -p4 \
    --transcriptome-index=transcriptome_data/known \
    hg19 sample2_1.fq.z sample2_2.fq.z
ADD REPLYlink modified 3.3 years ago • written 3.3 years ago by genomax71k

Thank you very much! So can I use any gtf file with known genes regardless of the project ? For example, http://useast.ensembl.org/info/data/ftp/index.html from here, can I use mouse GTF gene sets, after -G option?

ADD REPLYlink written 3.3 years ago by ag11940
1

If you are trying to replicate the analysis in a paper then make sure you get it from the same location/for the same genome build. Otherwise your results would be different from what is in the paper.

ADD REPLYlink modified 3.3 years ago • written 3.3 years ago by genomax71k

Thank you very much for the help. One last thing, in general can I use Ref-Seq for annotation, or it has to be a specific location o a chromosome?

ADD REPLYlink written 3.3 years ago by ag11940

If I understand the question correctly

RefSeq annotations would be stand alone (though the accession numbers may be included in the GTF file you will use). So if you want to correlate gene names with Refseq ID's you should be able to do that.

ADD REPLYlink written 3.3 years ago by genomax71k

Thank you very much!

ADD REPLYlink written 3.3 years ago by ag11940
1
gravatar for WouterDeCoster
3.3 years ago by
Belgium
WouterDeCoster40k wrote:

As you can read when using

tophat --help

the -G flag requires a GTF file

-G/--GTF                       <filename>  (GTF/GFF with known transcripts)
  
ADD COMMENTlink written 3.3 years ago by WouterDeCoster40k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1750 users visited in the last hour