Question: Tophat2 fails to map to human genome
gravatar for cvissers
8 weeks ago by
cvissers0 wrote:

Hi everyone, I'm fairly new to RNA Seq analysis and am having a problem with mapping to the human genome. I'm not sure if there is difficulty because of the .bt2l large indexes made during the bowtie build, because when I tried mapping the same fasta file to the chicken genome/transcriptome it worked and I got an accepted_hits.bam file. When I map to the human genome, either GRCh38 or hg19, I get the following error:

[2017-10-13 20:55:12] Beginning TopHat run (v2.0.14)
[2017-10-13 20:55:12] Checking for Bowtie
                  Bowtie version:
[2017-10-13 20:55:12] Checking for Bowtie index files (transcriptome)..
[2017-10-13 20:55:12] Checking for Bowtie index files (genome)..
[2017-10-13 20:55:12] Checking for reference FASTA file
[2017-10-13 20:55:12] Generating SAM header for /dcl01/song/data/caroline/genomes/GRCh38/new/GRCh38
[2017-10-13 20:55:17] Reading known junctions from GTF file
[2017-10-13 20:55:58] Preparing reads
         left reads: min. length=20, max. length=51, 7351157 kept reads (1288 discarded)
[2017-10-13 20:56:57] Using pre-built transcriptome data..
[2017-10-13 20:57:03] Mapping left_kept_reads to transcriptome with Bowtie2
[2017-10-13 21:33:46] Resuming TopHat pipeline with unmapped reads
[2017-10-13 21:33:47] Mapping left_kept_reads.m2g_um to genome GRCh38 with Bowtie2
Error running bowtie:
Error reading _plen[] array: 1032716945, 4294967292
Error: Encountered internal Bowtie 2 exception (#1)
Command: /jhpce/shared/community/core/bowtie2/2.2.5/bin/bowtie2-align-s --wrapper basic-0 -k 20 -D 15 -R 2 -N 0 -L 20 -i S,1,1.25 --gbar 4 --mp 6,2 --np 1 --rdg 5,3 --rfg 5,3 --sco
re-min C,-14,0 -p 1 --sam-no-hd -x /dcl01/song/data/caroline/genomes/GRCh38/new/GRCh38 -
(ERR): bowtie2-align exited with value 1

It always happens at the stage of mapping left kept read to the genome. Does anyone have any suggestions? Any help is very much appreciated!

Here is my tophat command: tophat -o /users/cvissers/alignedreads/1NTCInput1_tophatnew/ --transcriptome-index=/dcl01/song/data/caroline/genomes/hg19/ /dcl01/song/data/caroline/genomes/hg19/hg19 /users/cvissers/1NTCInput1_collapsed.fasta

or the same thing with GRCh38

rna-seq • 183 views
ADD COMMENTlink modified 8 weeks ago by genomax39k • written 8 weeks ago by cvissers0

You are using a version of tophat that was current in March of 2015 so at a minimum upgrade to the latest (v. 2.1.1) if you can.

That said TopHat should not be used any longer since there are more current better options for alignment of RNAseq data. Please consider using STAR, HISAT2, BBMap or any other splice-aware aligner.

ADD REPLYlink written 8 weeks ago by genomax39k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1026 users visited in the last hour