I was comparing the results from ANNOVAR and VEP and what I found is that
gene-based vs transcript based
ANNOVAR (ENSEMBL ids) seems to output information only gene-based, not transcript based, therefore out of 81 SNPs I have 81(+-) genes as those that fall in intronic/exonic/ncRNA has one gene annotation and others neighboring genes.
VEP outputs all ensembld ids of transcripts that are affected by the SNP.
1. Is there any possibility to force ANNOVAR to take all transcripts affected by SNP into consideration?
annovar line:
annotate_variation.pl -build hg19 SNPs_annovar_input.avinput humandb/ -dbtype ensGene
vep:
variant_effect_predictor.pl -i SNPs_vep_input.txt --cache --force_overwrite --symbol
intergenic-variantion handling
If a variation is intergentic, ANNOVAR outputs the neighboring genes. VEP tells nothing about the neighboring genes.
2. Is there any possibility to force VEP to output the neighboring genes for an internecine variations?
strand information in VEP
Strand information should be specified for VEP input and is crucial for a prediction about how the variant affects AS. ANNULAR does not need this specification. I do not have a strand information for my SNPs (they are taken from a paper and authors provided no information about the strand, only MAF, normal allele and risk allele).
3. Is there a possibility to assign each SNP a strand using R and some database (better command line approach)?
Thanks