Question: Transcriptome analysis using tophat and bowtie
gravatar for NAVID.BAVI
4 months ago by
NAVID.BAVI0 wrote:

Hi everyone,

I'm a new user and have been trying to reanalyze a zebrafish RNA seq data from another study which are available in fastq files (SRA tools were used). However, I've been stuck on the use of tophat for a while and I cannot move forward. I'm running tophat through:

module load bowtie/0.12.8
module load samtools/0.1.18
module load tophat/2.0.3

tophat --no-coverage-search -G /data/genes.gtf -o /data/tophat_output/test4 /data/Genome/BowtieIndex/genome /data/fastq/nSC1.fastq > /data/tophat_output/test4/test4-nSC1.log

I used those versions of software as I don't know which versions of samtools tophat and bowtie work well together. The problem is I don't get any accepted hits (ie, accepted.hits.bam is empty) and all the reads goes to unmapped.bam file.

In the tophat logfile i get:

[2018-09-14 12:52:16] Checking for Bowtie
  Bowtie 2 not found, checking for older version..
  Bowtie version:
[2018-09-14 12:52:16] Checking for Samtools
Samtools version:
[2018-09-14 12:52:16] Checking for Bowtie index files
[2018-09-14 12:52:16] Checking for reference FASTA file
[2018-09-14 12:52:16] Generating SAM header for /lustre/beagle2/navbav/hearing-transcriptomics/zebrafish_2018_Batra/Genome/BowtieIndex/genome
format: fastq
quality scale: phred33 (default)
[2018-09-14 12:52:21] Reading known junctions from GTF file
[2018-09-14 12:52:24] Preparing reads
left reads: min. length=35, max. length=101, 54560412 kept reads (284568 discarded)
[2018-09-14 13:24:16] Creating transcriptome data files..
[2018-09-14 13:25:49] Building Bowtie index from genes.fa
Error: Couldn't build bowtie index with err = -11

I searched about this error and seems to be a compatibility issue between the names of the genes in the GTF file and the Bowtie index files. But given I used iGENOME for my reference genome and GTF (annotation) file, am I wrong to think that they should be matched already ? I used the UCSD format btw.

Nevertheless, I checked the gene names in the GTF file and the reference genome file were matched to the best that I could check, which are as following:

For Ref file:

> bowtie-inspect --names genome
for GTF file

> head -5 genes.gtf 

chr1 unknown exon 6642 6760 . - . gene_id "rpl24"; gene_name "rpl24"; p_id "P5276"; transcript_id "NM_173235"; tss_id "TSS1927";
chr1 unknown stop_codon 6680 6682 . - . gene_id "rpl24"; gene_name "rpl24"; p_id "P5276"; transcript_id "NM_173235"; tss_id "TSS1927";
chr1 unknown CDS 6683 6760 . - 0 gene_id "rpl24"; gene_name "rpl24"; p_id "P5276"; transcript_id "NM_173235"; tss_id "TSS1927";
chr1 unknown CDS 6892 6955 . - 1 gene_id "rpl24"; gene_name "rpl24"; p_id "P5276"; transcript_id "NM_173235"; tss_id "TSS1927";
chr1 unknown exon 6892 6955 . - . gene_id "rpl24"; gene_name "rpl24"; p_id "P5276"; transcript_id "NM_173235"; tss_id "TSS1927";

I was wondering if someone have any advice for me?

Much appreciated in advance. Navid

ADD COMMENTlink modified 4 months ago by RamRS20k • written 4 months ago by NAVID.BAVI0

Even though you are trying to reanalyze old data there is no reason to use TopHat for this. Use STAR, HISAT2, BBMap instead.

That said you need to pay attention to this part from TopHat Manual when you want to make transcriptome specific index. This is a special one time run that you do before you align the actual data.

Please note that starting with version 2.0.10 TopHat can be invoked with just the -G/--GTF and --transcriptome-index options but without providing any input reads (the <genome_index_base> argument is still required). This is a special usage directing TopHat to only build the transcriptome index data files for the given annotation and then exit. Note: Only after the transcriptome files are built with one of the methods above, by a single TopHat process, it is safe to run multiple TopHat processes simultaneously making use of the same pre-built transcriptome index data.

ADD REPLYlink modified 4 months ago • written 4 months ago by genomax60k

Thanks a lot for the guidelines, I'll go through all the comments, as suggested.

ADD REPLYlink written 4 months ago by NAVID.BAVI0

Tophat has been retired in favour of HISAT, and the versions you are using for all programs are really old. Unless you want to try and replicate exactly what has been done in a paper, use HISAT or STAR (any one of them replaces Bowtie+Tophat, with greater accuracy and speed) for mapping, and SAMtools 1.X - current version is 1.9 - to manipulate SAM files.

edit: trying to replicate published analyses never completely succeeds, so don't be surprised when, down the line, your results differ from the published ones.

ADD REPLYlink modified 4 months ago • written 4 months ago by h.mon22k

Thank you, I'll consider using other packages as well provided I can find step by step tutorials for them.

Cheers, N

ADD REPLYlink written 4 months ago by NAVID.BAVI0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2052 users visited in the last hour