Question: Getting Error with bowtie2! "Error: Couldn't build bowtie index with err = 1"
gravatar for baunruh
17 months ago by
baunruh0 wrote:

I downloaded the "cDNA" and GFF3 for mus musculus from the ensembl website.

I built a bowtie index from bowtie2 using:

bowtie2-build Mus_musculus.GRCm38.90.fa Mus_musculus.GRCm38.90

Then I tried to run tophat2:

tophat -p 2 --b2-L 15 -G Mus_musculus.GRCm38.90.gff3 -o testmap_gtf Mus_musculus.GRCm38.90.fa testmap.fastq

which runs perfectly fine without the gff3 file but not with it. I did a bit of research and most people are saying it is a result of the annotation being different, however I downloaded these from the exact same source so I don't see why it would be different or how I could check it. Could somebody walk me through this please?

Then I got this error

[2017-11-17 14:01:37] Beginning TopHat run (v2.0.9)
[2017-11-17 14:01:37] Checking for Bowtie
          Bowtie version:
[2017-11-17 14:01:37] Checking for Samtools
        Samtools version:
[2017-11-17 14:01:37] Checking for Bowtie index files (genome)..
[2017-11-17 14:01:37] Checking for reference FASTA file
    Warning: Could not find FASTA file /home/baunruh/RibosomeTEData/cDNAReference/Mus_musculus.GRCm38.90.fa.fa
[2017-11-17 14:01:37] Reconstituting reference FASTA file from Bowtie index
  Executing: /apps/packages/bio/bowtie2/2.1.0/bowtie2-inspect /home/baunruh/RibosomeTEData/cDNAReference/Mus_musculus.GRCm38.90.fa > /home/baunruh/RibosomeTEData/testmap_gtf/tmp/Mus_musculus.GRCm38.90.fa.fa
[2017-11-17 14:01:41] Generating SAM header for /home/baunruh/RibosomeTEData/cDNAReference/Mus_musculus.GRCm38.90.fa
    format:      fastq
    quality scale:   phred33 (default)
[2017-11-17 14:01:49] Reading known junctions from GTF file
[2017-11-17 14:02:03] Preparing reads
     left reads: min. length=25, max. length=34, 23328414 kept reads (32420 discarded)
[2017-11-17 14:04:26] Building transcriptome data files..
[2017-11-17 14:04:40] Building Bowtie index from Mus_musculus.GRCm38.90.fa
Error: Couldn't build bowtie index with err = 1
ADD COMMENTlink modified 17 months ago by genomax65k • written 17 months ago by baunruh0

Please do not use TopHat for new projects unless you have an absolute need to. Use STAR, BBMap, HISAT2 which are newer recommended programs.

If you are going to use an annotation file then you should not use just the cDNA sequence. You should get the sequence of the full genome. Coordinates in your GTF file are referring to the entire genome.

ADD REPLYlink modified 17 months ago • written 17 months ago by genomax65k

Sorry! I am new to this, apparently I was supposed to align them to the cDNA reference without the gtf then align the accepted_hits to the entire genome because I only want reads aligned to the CDNA. Does this sound right?

ADD REPLYlink written 17 months ago by baunruh0

You could do it two ways. Either just align to cDNA without using GTF. Or use the whole genome/GTF with a special initial TopHat run which makes the transcriptome specific sequence index. Read about the second method on TopHat manual page (using TopHat section --transcriptome-index <dir/prefix> part).

ADD REPLYlink modified 17 months ago • written 17 months ago by genomax65k

At tophat2 command, try not to write .fa extension

ADD REPLYlink written 17 months ago by Mehmet460
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1135 users visited in the last hour