Hi everyone, I'm fairly new to RNA Seq analysis and am having a problem with mapping to the human genome. I'm not sure if there is difficulty because of the .bt2l large indexes made during the bowtie build, because when I tried mapping the same fasta file to the chicken genome/transcriptome it worked and I got an accepted_hits.bam file. When I map to the human genome, either GRCh38 or hg19, I get the following error:
[2017-10-13 20:55:12] Beginning TopHat run (v2.0.14)
-----------------------------------------------
[2017-10-13 20:55:12] Checking for Bowtie
Bowtie version: 2.2.5.0
[2017-10-13 20:55:12] Checking for Bowtie index files (transcriptome)..
[2017-10-13 20:55:12] Checking for Bowtie index files (genome)..
[2017-10-13 20:55:12] Checking for reference FASTA file
[2017-10-13 20:55:12] Generating SAM header for /dcl01/song/data/caroline/genomes/GRCh38/new/GRCh38
[2017-10-13 20:55:17] Reading known junctions from GTF file
[2017-10-13 20:55:58] Preparing reads
left reads: min. length=20, max. length=51, 7351157 kept reads (1288 discarded)
[2017-10-13 20:56:57] Using pre-built transcriptome data..
[2017-10-13 20:57:03] Mapping left_kept_reads to transcriptome GRCh38.tr with Bowtie2
[2017-10-13 21:33:46] Resuming TopHat pipeline with unmapped reads
[2017-10-13 21:33:47] Mapping left_kept_reads.m2g_um to genome GRCh38 with Bowtie2
[FAILED]
Error running bowtie:
Error reading _plen[] array: 1032716945, 4294967292
Error: Encountered internal Bowtie 2 exception (#1)
Command: /jhpce/shared/community/core/bowtie2/2.2.5/bin/bowtie2-align-s --wrapper basic-0 -k 20 -D 15 -R 2 -N 0 -L 20 -i S,1,1.25 --gbar 4 --mp 6,2 --np 1 --rdg 5,3 --rfg 5,3 --sco
re-min C,-14,0 -p 1 --sam-no-hd -x /dcl01/song/data/caroline/genomes/GRCh38/new/GRCh38 -
(ERR): bowtie2-align exited with value 1
It always happens at the stage of mapping left kept read to the genome. Does anyone have any suggestions? Any help is very much appreciated!
Here is my tophat command: tophat -o /users/cvissers/alignedreads/1NTCInput1_tophatnew/ --transcriptome-index=/dcl01/song/data/caroline/genomes/hg19/hg19.tr/hg19.tr /dcl01/song/data/caroline/genomes/hg19/hg19 /users/cvissers/1NTCInput1_collapsed.fasta
or the same thing with GRCh38
You are using a version of tophat that was current in March of 2015 so at a minimum upgrade to the latest (v. 2.1.1) if you can.
That said TopHat should not be used any longer since there are more current better options for alignment of RNAseq data. Please consider using STAR, HISAT2, BBMap or any other splice-aware aligner.