Hi all,
I have a genome-wide list of germline SNPs and short indels for Arabidopsis thaliana, which I generated with Varscan. Regardless of the tool used to generate them, I would like to annotate them, i.e. knowing which ones can cause an aminoacid change or an early stop, using the default Arabidopsis thaliana Columbia 0 cultivar, for which I have both the sequence (FASTA from TAIR10) and the updated annotation (GFF from Araport).
Chrom Position Ref Var
Chr1 626503 G T
Chr1 926694 C T
Chr1 5280350 C A
Chr1 5699993 C A
Chr1 7004559 G A
Chr1 8325810 C T
Chr1 9371723 T G
What I want to do is similar to what Annovar does, but unfortunately Annovar does not support Arabidopsis. I was thinking of an already existing R pipeline that takes in a genome, an annotation, a SNP/indel list and boom, annotation. But I couldn't find any, except maybe snpEffect. Any tips? Thanks in advance!
Yes snpEff is a good tool. I have used snpEff and have got satisfactory results. It takes annotation file (.gtf) and .vcf file of variants as input.