Question

trnascan seqfaults and other issues

0

Entering edit mode

6.6 years ago

dmathog ▴ 40

Downloaded unpacked and built tRNAscan-SE-1.3.1

Had some problems trying to run the underlying program trnascan-1.4.

First issue, it segfaults on a header like ">scf7180000558994" because its fasta reading code is looking only for a space to terminate the name, and there isn't one. Changed that code to look for a space OR a '\n' and then it runs.

Second issue - the output has mostly hits with "nnnnnnnn" on one end or the other. Run it like this for a while and then force an exit with ^C, then examine what it found as follows

trnascan-1.4 -o Lv_tRNAs.fa seqfile
grep "potential tRNA sequence" Lv_tRNAs.fa  | wc
   8782   35128 1334837
grep "potential tRNA sequence" Lv_tRNAs.fa  | grep nnn$ | wc
   4818   19272  737638
grep "potential tRNA sequence" Lv_tRNAs.fa  | grep '\ nnn' | wc
   3604   14416  546644
grep "potential tRNA sequence" Lv_tRNAs.fa  | grep -v 'nnn' | wc
    146     584   18243

The input sequence has runs of N because it is genomic scaffold. Near as I can tell the program does something wrong when it hits NNN resulting in what looks like about 57X more false hits than potentially real hits.

Final issue. After filtering out all the matches ending with polyN the remainder is found to consist of many (6 or more) different "takes" on the same sequence region. In these minor variants various positions move a base or so one way or the other.

Is this all as it should be???

I take it that one would normally run tRNAscan-SE to avoid these issues? Does that wrapper script modify the names so that the underlying program doesn't segfault?

Thanks

sequence software error • 1.3k views

ADD COMMENT • link 6.6 years ago by dmathog ▴ 40