Question: tophat2 GTF invalid strand error
0
gravatar for natsterbug
3.8 years ago by
natsterbug0
natsterbug0 wrote:

I am running tophat2.1.0 (bowtie 2.2.6.0) on SE RNAseq data and encountering the following error:

[2016-02-29 09:38:34] Beginning TopHat run (v2.1.0)
-----------------------------------------------
[2016-02-29 09:38:34] Checking for Bowtie
Bowtie version: 2.2.6.0
[2016-02-29 09:38:34] Checking for Bowtie index files (transcriptome)..
[2016-02-29 09:38:34] Checking for Bowtie index files (genome)..
[2016-02-29 09:38:34] Checking for reference FASTA file
[2016-02-29 09:38:34] Generating SAM header for PGSC_DM_v4.03_index
[2016-02-29 09:38:40] Reading known junctions from GTF file
[2016-02-29 09:38:44] Preparing reads
left reads: min. length=30, max. length=40, 47957373 kept reads (864 discarded)
[2016-02-29 09:48:09] Using pre-built transcriptome data..
[2016-02-29 09:48:11] Mapping left_kept_reads to transcriptome known with Bowtie2
[FAILED]
Error running:
/opt/software/TopHat2/2.1.0--GCC-4.4.5/bin/bam2fastx --all tophat_Kalkaska/tmp/left_kept_reads.bam|/opt/software/bowtie2/2.2.6--GCC-4.4.5/bin/bowtie2 -k 60 -D 15 -R 2 -N 0 -L 20 -i S,1,1.25 --gbar 4 --mp 6,2 --np 1 --rdg 5,3 --rfg 5,3 --score-min C,-62,0 -p 1 --sam-no-hd -x transcriptome_data/known -|/opt/software/TopHat2/2.1.0--GCC-4.4.5/bin/fix_map_ordering --bowtie2-min-score 55 --read-mismatches 3 --read-gap-length 10 --read-edit-dist 10 --read-realign-edit-dist 11 --sam-header tophat_Kalkaska/tmp/known.bwt.samheader.sam - - tophat_Kalkaska/tmp/left_kept_reads.m2g_um.bam | /opt/software/TopHat2/2.1.0--GCC-4.4.5/bin/map2gtf --sam-header tophat_Kalkaska/tmp/PGSC_DM_v4.03_index_genome.bwt.samheader.sam transcriptome_data/known.fa.tlst - tophat_Kalkaska/tmp/left_kept_reads.m2g.bam > tophat_Kalkaska/logs/m2g_left_kept_reads.out

When I run the error output, I get:

Error at parsing .tlst line (invalid strand):
53551 PGSC0003DMT400030180 ST4.03ch12. 56298573-56298656
(ERR): bowtie2-align died with signal 13 (PIPE)

Looking at the GTF file (PGSC_DM_V403_genes.gff from SpudDB), there are a fair number of entries where the strand is recorded as .

Online forums report the same issue but no solutions, save for deleting these entries, where are very numerous. Are there any other solutions I could try?

rna-seq software error • 2.5k views
ADD COMMENTlink modified 3.8 years ago • written 3.8 years ago by natsterbug0

ADD REPLYlink modified 3.8 years ago • written 3.8 years ago by natsterbug0

Are those entries redundant i.e. are there lines with those ID's that have valid strand information?

ADD REPLYlink written 3.8 years ago by genomax75k

The entries lacking valid strand information are unique.

ADD REPLYlink written 3.8 years ago by natsterbug0

I had a look at the GFF file. It appears that most of the annotation entries are from cufflinks and few other gene prediction algorithms (GLEAN, BESTORF etc). Even thought the coordinates are different most of the entries appear to be covered by cufflinks records, with different co-ordinates (in the neighborhood).

Since you don't have a better alternative (I assume) you may want to remove entries that don't have a +/- (since valid GFF options for strand field are only those) and proceed with the analysis.

ADD REPLYlink modified 3.8 years ago • written 3.8 years ago by genomax75k

As I do not have a better alternative, I filtered the GFF and ran tophat2 using the filtered file and did not encounter any errors.

ADD REPLYlink written 3.8 years ago by natsterbug0
0
gravatar for Istvan Albert
3.8 years ago by
Istvan Albert ♦♦ 81k
University Park, USA
Istvan Albert ♦♦ 81k wrote:

Not really. It is an invalid input file that needs to be filtered.

ADD COMMENTlink written 3.8 years ago by Istvan Albert ♦♦ 81k

The entries with "." make up 7% of the total and I am concerned about removing all these. Is this a valid concern?

ADD REPLYlink written 3.8 years ago by natsterbug0

If you don't have strand information then the aligner can't splice over these.

Tophat will find new splicing sites (that is those not listed in the file) so you don't have to be overly concerned with missing out.

ADD REPLYlink written 3.8 years ago by Istvan Albert ♦♦ 81k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1393 users visited in the last hour