building snpEff database
0
0
Entering edit mode
8 weeks ago
aabhordia ▴ 10

Hello everyone,

I am trying to build a new database in snpEff I have followed all the steps given in manual. But, unfortunately could not build it

Every time I am trying, it shows these warnings:-

WARNING_RARE_AA_POSSITION_NOT_FOUND: Cannot find transcript 'XP_018533975.1'
WARNING_RARE_AA_POSSITION_NOT_FOUND: Cannot find transcript 'XP_018533983.1'
WARNING_RARE_AA_POSSITION_NOT_FOUND: Cannot find transcript 'XP_018533991.1'
WARNING_RARE_AA_POSSITION_NOT_FOUND: Cannot find transcript 'XP_018535019.1'
WARNING_RARE_AA_POSSITION_NOT_FOUND: Cannot find transcript 'XP_018535460.1'
WARNING_RARE_AA_POSSITION_NOT_FOUND: Too many 'WARNING_RARE_AA_POSSITION_NOT_FOUND' warnings, no further warnings will be shown.
.
.
.

00:04:10 Checking database using CDS sequences
00:04:10 Reading CDSs from file '/mnt/d/snpEff/data/Genome/cds.fa'...
00:04:12 done (137930 CDSs).
00:04:12 Comparing CDS...
Labels:
'+' : OK
'.' : Missing
'*' : Error

....................................................................................................
....................................................................................................

CDS check:      Genome       OK: 0   Warnings: 0     Not found: 98752        Errors: 0       Error percentage: NaN%

FATAL ERROR: No CDS checked. This is might be caused by differences in FASTA file transcript IDs respect to database's transcript's IDs.


Please suggest me how can I resolve this issue.

snpEff building database • 450 views
1
Entering edit mode

Hello!

Please consider adding some format to your question to make it more readable.

Having said that, I guess your error is related with the gff3/gtf you provide when building the database. Do you have CDS annotations for each transcript in the GTF? Do you have all the fields that a GTF usually have (such as a transcript line for each transcript, CDS, exon, genes, UTRs...)

0
Entering edit mode

I have tried using both format gff3 or gtf and CDS.. but got the same error :(

1
Entering edit mode

same problem here -- the names of the transcripts / entries in the gff match. It won't take it. Wondering if there is a secret sauce to getting this to work :-)

1
Entering edit mode

I had a similar problem in the past, detailed here:

https://github.com/pcingola/SnpEff/issues/388

as it turns out the "officially" accepted GenBank file had some inconsistencies in it, in return SnpEff raised a number of quite confusing errors.

0
Entering edit mode

did you check what is written there as last line in the error/warning output ?

Would be my first guess as well, that for some reason for instance the IDs in your fasta file do not match the ones in the DB.

0
Entering edit mode

Yes I checked and it is same, because I am using the same reference for variant calling and building snpEff database along with gtf , CDS and protein (in FASTA format).

And you are right the reason is this only.

But unable to sort it out

1
Entering edit mode

time to backtrack everything then:

• take for instance that first cannot find transcript ID
• grep it from the CDS/fasta file
• look it up in the DB

(in general check if you can find that ID in each step/input of this process)