I am a PhD student with very little experience in bioinformatics (or very little experience at all, I started two months ago). I’m having some problems getting Snpeff to work with gff coordinates obtained by Transdecoder. I was given by a group with which I am collaborating the assembly of a genome and a gtf file with transcript information derived from RNAseq. I used Transdecoder following the instructions, with the –single_best_orf option, and I got the cds file and a gff3. I used the gff3 to build a database for snpeff, because I have to evaluate the effect of some SNPs on the genome. Howevere, when I launched Snpeff eff, I received a great number of warnings:
INFO_REALIGN_3_PRIME 1 WARNING_TRANSCRIPT_NO_START_CODON 202855 WARNING_TRANSCRIPT_NO_START_CODON&INFO_REALIGN_3_PRIME 2 WARNING_TRANSCRIPT_NO_STOP_CODON 17281 Protein coding transcripts : 2426 # Length errors : 0 ( 0,00% ) # STOP codons in CDS errors : 0 ( 0,00% ) # START codon errors : 686 ( 28,28% ) # STOP codon warnings : 183 ( 7,54% ) # UTR sequences : 2409 ( 99,30% ) # Total Errors : 686 ( 28,28% )
Given the low number of transcripts, this amount of warnings seems to be extremely high. Is it normal? Also, I checked the CDSs obtained by Transdecoder and, even if not all of them start with ATG, all of them have a start codon near the beginning of the sequence, so I really cannot explain this number of warnings. Do you have any suggestions? May the life of he/she who comes to my aid be filled with cakes and pizzas.