Question: Getting Error with bowtie2! "Error: Couldn't build bowtie index with err = 1"
gravatar for baunruh
2.8 years ago by
baunruh0 wrote:

I downloaded the "cDNA" and GFF3 for mus musculus from the ensembl website.

I built a bowtie index from bowtie2 using:

bowtie2-build Mus_musculus.GRCm38.90.fa Mus_musculus.GRCm38.90

Then I tried to run tophat2:

tophat -p 2 --b2-L 15 -G Mus_musculus.GRCm38.90.gff3 -o testmap_gtf Mus_musculus.GRCm38.90.fa testmap.fastq

which runs perfectly fine without the gff3 file but not with it. I did a bit of research and most people are saying it is a result of the annotation being different, however I downloaded these from the exact same source so I don't see why it would be different or how I could check it. Could somebody walk me through this please?

Then I got this error

[2017-11-17 14:01:37] Beginning TopHat run (v2.0.9)
[2017-11-17 14:01:37] Checking for Bowtie
          Bowtie version:
[2017-11-17 14:01:37] Checking for Samtools
        Samtools version:
[2017-11-17 14:01:37] Checking for Bowtie index files (genome)..
[2017-11-17 14:01:37] Checking for reference FASTA file
    Warning: Could not find FASTA file /home/baunruh/RibosomeTEData/cDNAReference/Mus_musculus.GRCm38.90.fa.fa
[2017-11-17 14:01:37] Reconstituting reference FASTA file from Bowtie index
  Executing: /apps/packages/bio/bowtie2/2.1.0/bowtie2-inspect /home/baunruh/RibosomeTEData/cDNAReference/Mus_musculus.GRCm38.90.fa > /home/baunruh/RibosomeTEData/testmap_gtf/tmp/Mus_musculus.GRCm38.90.fa.fa
[2017-11-17 14:01:41] Generating SAM header for /home/baunruh/RibosomeTEData/cDNAReference/Mus_musculus.GRCm38.90.fa
    format:      fastq
    quality scale:   phred33 (default)
[2017-11-17 14:01:49] Reading known junctions from GTF file
[2017-11-17 14:02:03] Preparing reads
     left reads: min. length=25, max. length=34, 23328414 kept reads (32420 discarded)
[2017-11-17 14:04:26] Building transcriptome data files..
[2017-11-17 14:04:40] Building Bowtie index from Mus_musculus.GRCm38.90.fa
Error: Couldn't build bowtie index with err = 1
ADD COMMENTlink modified 2.8 years ago by genomax89k • written 2.8 years ago by baunruh0

Please do not use TopHat for new projects unless you have an absolute need to. Use STAR, BBMap, HISAT2 which are newer recommended programs.

If you are going to use an annotation file then you should not use just the cDNA sequence. You should get the sequence of the full genome. Coordinates in your GTF file are referring to the entire genome.

ADD REPLYlink modified 2.8 years ago • written 2.8 years ago by genomax89k

Sorry! I am new to this, apparently I was supposed to align them to the cDNA reference without the gtf then align the accepted_hits to the entire genome because I only want reads aligned to the CDNA. Does this sound right?

ADD REPLYlink written 2.8 years ago by baunruh0

You could do it two ways. Either just align to cDNA without using GTF. Or use the whole genome/GTF with a special initial TopHat run which makes the transcriptome specific sequence index. Read about the second method on TopHat manual page (using TopHat section --transcriptome-index <dir/prefix> part).

ADD REPLYlink modified 2.8 years ago • written 2.8 years ago by genomax89k

At tophat2 command, try not to write .fa extension

ADD REPLYlink written 2.8 years ago by Mehmet600
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 989 users visited in the last hour