Question

Getting Error with bowtie2! "Error: Couldn't build bowtie index with err = 1"

0

Entering edit mode

6.4 years ago

baunruh ▴ 10

I downloaded the "cDNA" and GFF3 for mus musculus from the ensembl website. https://www.ensembl.org/info/data/ftp/index.html

I built a bowtie index from bowtie2 using:

bowtie2-build Mus_musculus.GRCm38.90.fa Mus_musculus.GRCm38.90

Then I tried to run tophat2:

tophat -p 2 --b2-L 15 -G Mus_musculus.GRCm38.90.gff3 -o testmap_gtf Mus_musculus.GRCm38.90.fa testmap.fastq

which runs perfectly fine without the gff3 file but not with it. I did a bit of research and most people are saying it is a result of the annotation being different, however I downloaded these from the exact same source so I don't see why it would be different or how I could check it. Could somebody walk me through this please?

Then I got this error

[2017-11-17 14:01:37] Beginning TopHat run (v2.0.9)
-----------------------------------------------
[2017-11-17 14:01:37] Checking for Bowtie
          Bowtie version:    2.1.0.0
[2017-11-17 14:01:37] Checking for Samtools
        Samtools version:    0.1.19.0
[2017-11-17 14:01:37] Checking for Bowtie index files (genome)..
[2017-11-17 14:01:37] Checking for reference FASTA file
    Warning: Could not find FASTA file /home/baunruh/RibosomeTEData/cDNAReference/Mus_musculus.GRCm38.90.fa.fa
[2017-11-17 14:01:37] Reconstituting reference FASTA file from Bowtie index
  Executing: /apps/packages/bio/bowtie2/2.1.0/bowtie2-inspect /home/baunruh/RibosomeTEData/cDNAReference/Mus_musculus.GRCm38.90.fa > /home/baunruh/RibosomeTEData/testmap_gtf/tmp/Mus_musculus.GRCm38.90.fa.fa
[2017-11-17 14:01:41] Generating SAM header for /home/baunruh/RibosomeTEData/cDNAReference/Mus_musculus.GRCm38.90.fa
    format:      fastq
    quality scale:   phred33 (default)
[2017-11-17 14:01:49] Reading known junctions from GTF file
[2017-11-17 14:02:03] Preparing reads
     left reads: min. length=25, max. length=34, 23328414 kept reads (32420 discarded)
[2017-11-17 14:04:26] Building transcriptome data files..
[2017-11-17 14:04:40] Building Bowtie index from Mus_musculus.GRCm38.90.fa
    [FAILED]
Error: Couldn't build bowtie index with err = 1

RNA-Seq sequencing alignment bowtie tophat • 3.1k views

ADD COMMENT • link updated 6.4 years ago by GenoMax 141k • written 6.4 years ago by baunruh ▴ 10

2

Entering edit mode

Please do not use TopHat for new projects unless you have an absolute need to. Use STAR, BBMap, HISAT2 which are newer recommended programs.

If you are going to use an annotation file then you should not use just the cDNA sequence. You should get the sequence of the full genome. Coordinates in your GTF file are referring to the entire genome.

ADD REPLY • link 6.4 years ago by GenoMax 141k

0

Entering edit mode

Sorry! I am new to this, apparently I was supposed to align them to the cDNA reference without the gtf then align the accepted_hits to the entire genome because I only want reads aligned to the CDNA. Does this sound right?

ADD REPLY • link 6.4 years ago by baunruh ▴ 10

0

Entering edit mode

You could do it two ways. Either just align to cDNA without using GTF. Or use the whole genome/GTF with a special initial TopHat run which makes the transcriptome specific sequence index. Read about the second method on TopHat manual page (using TopHat section --transcriptome-index <dir/prefix> part).