building snpEff database
0
0
Entering edit mode
3 months ago
aabhordia ▴ 10

Hello everyone,

I am trying to build a new database in snpEff I have followed all the steps given in manual. But, unfortunately could not build it

Every time I am trying, it shows these warnings:-

WARNING_RARE_AA_POSSITION_NOT_FOUND: Cannot find transcript 'XP_018533975.1'
WARNING_RARE_AA_POSSITION_NOT_FOUND: Cannot find transcript 'XP_018533983.1'
WARNING_RARE_AA_POSSITION_NOT_FOUND: Cannot find transcript 'XP_018533991.1'
WARNING_RARE_AA_POSSITION_NOT_FOUND: Cannot find transcript 'XP_018535019.1'
WARNING_RARE_AA_POSSITION_NOT_FOUND: Cannot find transcript 'XP_018535460.1'
WARNING_RARE_AA_POSSITION_NOT_FOUND: Too many 'WARNING_RARE_AA_POSSITION_NOT_FOUND' warnings, no further warnings will be shown.
.
.
.

00:04:10 Checking database using CDS sequences
00:04:10 Reading CDSs from file '/mnt/d/snpEff/data/Genome/cds.fa'...
00:04:12 done (137930 CDSs).
00:04:12 Comparing CDS...
        Labels:
                '+' : OK
                '.' : Missing
                '*' : Error

        ....................................................................................................
        ....................................................................................................

CDS check:      Genome       OK: 0   Warnings: 0     Not found: 98752        Errors: 0       Error percentage: NaN%

FATAL ERROR: No CDS checked. This is might be caused by differences in FASTA file transcript IDs respect to database's transcript's IDs.

Please suggest me how can I resolve this issue.

Thank you in advance

snpEff building database • 561 views
ADD COMMENT
1
Entering edit mode

Hello!

Please consider adding some format to your question to make it more readable.

Having said that, I guess your error is related with the gff3/gtf you provide when building the database. Do you have CDS annotations for each transcript in the GTF? Do you have all the fields that a GTF usually have (such as a transcript line for each transcript, CDS, exon, genes, UTRs...)

ADD REPLY
0
Entering edit mode

I have tried using both format gff3 or gtf and CDS.. but got the same error :(

ADD REPLY
1
Entering edit mode

same problem here -- the names of the transcripts / entries in the gff match. It won't take it. Wondering if there is a secret sauce to getting this to work :-)

ADD REPLY
1
Entering edit mode

I had a similar problem in the past, detailed here:

https://github.com/pcingola/SnpEff/issues/388

as it turns out the "officially" accepted GenBank file had some inconsistencies in it, in return SnpEff raised a number of quite confusing errors.

ADD REPLY
0
Entering edit mode

did you check what is written there as last line in the error/warning output ?

Would be my first guess as well, that for some reason for instance the IDs in your fasta file do not match the ones in the DB.

ADD REPLY
0
Entering edit mode

Yes I checked and it is same, because I am using the same reference for variant calling and building snpEff database along with gtf , CDS and protein (in FASTA format).

And you are right the reason is this only.

But unable to sort it out

ADD REPLY
1
Entering edit mode

time to backtrack everything then:

  • take for instance that first cannot find transcript ID
  • grep it from the CDS/fasta file
  • look it up in the DB

(in general check if you can find that ID in each step/input of this process)

ADD REPLY

Login before adding your answer.

Traffic: 1201 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6