.gtf file error in Tophat2 - Error at parsing .tlst line (invalid strand):
Entering edit mode
8.6 years ago
4galaxy77 2.8k

I'm trying to align my reads to a genome + transcriptome, which is in the form of a .gtf file. It gets half way through and then runs this error:

user@it053392:~$ export PATH=/home/user/Downloads/bowtie2:$PATH
user@it053392:~$ export PATH=/home/user/Downloads/tophat2:$PATH
user@it053392:~$ echo $PATH
user@it053392:~$ /home/user/Downloads/tophat2/tophat -G /home/user/Desktop/_sam/RNAseq_beta_data/Transcriptome/merged.remDup.gtf -o /home/user/Desktop/Tophat_out /home/user/Downloads/bowtie2/example/index/DinoAnt /home/user/Desktop/_sam/RNAseq_beta_data/trimmed/23Y70_trimmed.fastq

[2015-11-04 11:52:45] Beginning TopHat run (v2.1.0)
[2015-11-04 11:52:45] Checking for Bowtie
          Bowtie version:
[2015-11-04 11:52:45] Checking for Bowtie index files (genome)..
[2015-11-04 11:52:45] Checking for reference FASTA file
    Warning: Could not find FASTA file /home/user/Downloads/bowtie2/example/index/DinoAnt.fa
[2015-11-04 11:52:45] Reconstituting reference FASTA file from Bowtie index
  Executing: /home/user/Downloads/bowtie2/bowtie2-inspect /home/user/Downloads/bowtie2/example/index/DinoAnt > /home/user/Desktop/Tophat_out/tmp/DinoAnt.fa
[2015-11-04 11:52:59] Generating SAM header for /home/user/Downloads/bowtie2/example/index/DinoAnt
[2015-11-04 11:53:01] Reading known junctions from GTF file
[2015-11-04 11:53:04] Preparing reads
     left reads: min. length=85, max. length=85, 23624572 kept reads (13384 discarded)
[2015-11-04 12:00:30] Building transcriptome data files /home/user/Desktop/Tophat_out/tmp/merged.remDup
[2015-11-04 12:00:51] Building Bowtie index from merged.remDup.fa
[2015-11-04 13:22:45] Mapping left_kept_reads to transcriptome merged.remDup with Bowtie2
Error running:
/home/user/Downloads/tophat2/bam2fastx --all /home/user/Desktop/Tophat_out/tmp/left_kept_reads.bam|/home/user/Downloads/bowtie2/bowtie2 -k 60 -D 15 -R 2 -N 0 -L 20 -i S,1,1.25 --gbar 4 --mp 6,2 --np 1 --rdg 5,3 --rfg 5,3 --score-min C,-14,0 -p 1 --sam-no-hd -x /home/user/Desktop/Tophat_out/tmp/merged.remDup -|/home/user/Downloads/tophat2/fix_map_ordering --bowtie2-min-score 15 --read-mismatches 2 --read-gap-length 2 --read-edit-dist 2 --read-realign-edit-dist 3 --sam-header /home/user/Desktop/Tophat_out/tmp/merged.remDup.bwt.samheader.sam - - /home/user/Desktop/Tophat_out/tmp/left_kept_reads.m2g_um.bam | /home/user/Downloads/tophat2/map2gtf --sam-header /home/user/Desktop/Tophat_out/tmp/DinoAnt_genome.bwt.samheader.sam /home/user/Desktop/Tophat_out/tmp/merged.remDup.fa.tlst - /home/user/Desktop/Tophat_out/tmp/left_kept_reads.m2g.bam > /home/user/Desktop/Tophat_out/logs/m2g_left_kept_reads.out

Running that error tells me this:

user@it053392:~$ /home/user/Downloads/tophat2/bam2fastx --all /home/user/Desktop/Tophat_out/tmp/left_kept_reads.bam|/home/user/Downloads/bowtie2/bowtie2 -k 60 -D 15 -R 2 -N 0 -L 20 -i S,1,1.25 --gbar 4 --mp 6,2 --np 1 --rdg 5,3 --rfg 5,3 --score-min C,-14,0 -p 1 --sam-no-hd -x /home/user/Desktop/Tophat_out/tmp/merged.remDup -|/home/user/Downloads/tophat2/fix_map_ordering --bowtie2-min-score 15 --read-mismatches 2 --read-gap-length 2 --read-edit-dist 2 --read-realign-edit-dist 3 --sam-header /home/user/Desktop/Tophat_out/tmp/merged.remDup.bwt.samheader.sam - - /home/user/Desktop/Tophat_out/tmp/left_kept_reads.m2g_um.bam | /home/user/Downloads/tophat2/map2gtf --sam-header /home/user/Desktop/Tophat_out/tmp/DinoAnt_genome.bwt.samheader.sam /home/user/Desktop/Tophat_out/tmp/merged.remDup.fa.tlst - /home/user/Desktop/Tophat_out/tmp/left_kept_reads.m2g.bam > /home/user/Desktop/Tophat_out/logs/m2g_left_kept_reads.out

Error at parsing .tlst line (invalid strand):
    31958 TCONS_00032473 scaffold40. 5-1634
(ERR): bowtie2-align exited with value 141

Which suggests that there is something wrong with the .gtf file at point 00032473. Some googling revealed its something to do with missing strand info, but nobody seemed to provide any solutions. Can anyone help me modify the file or something to make it work? Thanks

RNA-Seq tophat bowtie • 3.3k views
Entering edit mode

show us what is written in that line..

Entering edit mode

The gtf is not well formatted. Someway some data is missing from the previous lane: the parent info with exon starting at base 20703. In the same lane you have the information of scaffold40 that should be in the next lane

I don't know how you got this gtf file, but it is wrong


Login before adding your answer.

Traffic: 2666 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6