Question

start OMA run - file.log

0

Entering edit mode

6.1 years ago

dtejadamartinez ▴ 20

Hi,

I have a question about the file.log when OMA start to convert the files.

I download 30 coding genomes of different species from Ensembl or NCBI. In order to eliminate the transcripts and isoforms I used cd-hitest first and then I passed the files through TRANSDECODER.

When I start running OMA in the .log file shows me many of these errors:

WARNING: IUPAC ambiguity characters for DNA/RNA not supported. Will replace them with 'X'

Pre-processing input (DNA)
19099 sequences within 19099 entries considered
Creating file Cache/DB/Balaena_mysticetus.db.map for mapping
Building new Pat index in file Cache/DB/Balaena_mysticetus.db.tree with 27254391 entries
Pat index with 27254391 entries
 sorted, from "A</SEQ></E>\n" to "XXXXXXXXXXXXXXXXXXX"
Reading 44567976 characters from file Cache/DB/Balaenoptera_acutorostrata.db
Pre-processing input (DNA)
20993 sequences within 20993 entries considered
Creating file Cache/DB/Balaenoptera_acutorostrata.db.map for mapping
Building new Pat index in file Cache/DB/Balaenoptera_acutorostrata.db.tree with 37893972 entries
Pat index with 37893972 entries
 sorted, from "A</SEQ></E>\n" to "XXXXXXXXXXXXXXXXXXX"

I want to know if that errors can generate some problems with the normal run of OMA?

Thanks,

omabrowser OMA • 1.6k views

ADD COMMENT • link updated 4.5 years ago by Biostar 20 • written 6.1 years ago by dtejadamartinez ▴ 20

0

Entering edit mode

Tagging: adrian.altenhoff

ADD REPLY • link 6.1 years ago by GenoMax 141k

score 0 · Answer 1 · 2018-03-22

Hi,

it seems to me that you have an inconsistency between your input data and the parameters: From this output it looks to me that you specified in the parameters.drw file the InputDataType := 'DNA'; but you provide protein sequences (which would make sense to use). In that case OMA would convert all amino acids that are not ATCG to unknown nucleotides and threat the remaining amino acids as nucleotides. The proper setting should be InputDataType := 'AA'; as far as I can understand.

Cheers Adrian

score 0 · Answer 2 · 2018-03-22

0

Entering edit mode

6.1 years ago

dtejadamartinez ▴ 20

Hi, thanks for the answer.

I have all the sequences in nucleotides, and in the input I have InputDataType := 'DNA'; That's why I find it strange.

It only happens with the final files thrown by TRANSDECODER.

Cheers, Daniela

ADD COMMENT • link 6.1 years ago by dtejadamartinez ▴ 20