Dear all,
I have recently started bioinformatics, I am trying to build snpEff database for my strain of interest using mode such as GTF and GFF. But I am keep on having the same error in both of my strains. As per the definition, I believe theres is some header issue. But, I am not sure how or where exactly i need to change or does the error list to be exact.
Error:
FATAL ERROR: No CDS checked. This is might be caused by differences in FASTA file transcript IDs respect to database's transcript's IDs.
Transcript IDs from database (sample):
'TRANSCRIPT_gene-CAJCM15448_47400'
'TRANSCRIPT_gene-CAJCM15448_22100'
'TRANSCRIPT_gene-CAJCM15448_09130'
Transcript IDs from database (fasta file):
'lcl|BGOX01000002.1_cds_GBL49991.1_2265'
'lcl|BGOX01000001.1_cds_GBL47767.1_41'
'lcl|BGOX01000001.1_cds_GBL48203.1_477'
'lcl|BGOX01000001.1_cds_GBL48135.1_409'
'lcl|BGOX01000004.1_cds_GBL51971.1_4245'
My files are: cds.fa genes.gff genes.gtf protein.fa sequences.fa
So, now i would like to know, in which file i need to change the headers or is there anything i can do make the database build properly. Any suggestions, Please.
If I do take from the same source for all of my files and still have the same issue, then what would be the solution. If 22 points were shown by snpeff in my error, do i need to look for only that headers in my cds file.