Hi everyone, I have a vcf file for whole exome sequence. The file looks like that:
chr6 3118947 rs2231355 A G 1071.77 PASS AC=2;AF=1.00;AN=2;DB;DP=35;Dels=0.00;FS=0.000;HaplotypeScore=0.0000;MLEAC=2;MLEAF=1.00;MQ=60.00;MQ0=0;QD=30.62;SOR=2.115;set=variant GT:AD:DP:GQ:PL 1|1:0,35:35:90:1100,90,0
I wanted to add gene names to the file and read the annovar can be helpful for that (clinvar data is also a good thing for me). I used the following code from the manual:
./table_annovar.pl patient.final.vcf humandb/ -buildver hg19 -out patient_anno -remove -protocol refGene,cytoBand,exac03,avsnp147,dbnsfp30a -operation gx,r,f,f,f -nastring . -csvout -polish -xref example/gene_xref.txt
But there are no gene names and no clinvar data, the output file patient_anno.hg19_multianno.csv looks like that:
chr1,860799,.,G,C,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.
What am I doing wrong? Thanks!
Are you sure there is a gene at
chr1:860799
?There is nothing but '.' in Gene.refGene column
Do you mean there is no value across all sites? Can you check the refGene annotation file under
humandb/
?It should have gene symbols in the 13th column, I think.
Yes, no values. hg19_refGene.txt looks pretty normal, there are gene symbols in the 13th column. I think that the issue is not with names only, but something goes wrong in general. There are no values in any columns, but the first 5 (chr, position, snp, alt, ref). But I cannot understand what's wrong.
Can you try replacing the
gx
with justg
and removing the-xref example?gene_xref.txt
part?Didn't help :( I also noticed that patient.refGene.invalid_input file is twice larger then the patient.hg19_multianno.csv file
Can you look into the log files? There should be something that points to why you're seeing invalid input.
The first step looks ok to me: NOTICE: Processing operation=g protocol=refGene
NOTICE: Running with system command <annotate_variation.pl -geneanno="" -buildver="" hg19="" -dbtype="" refgene="" -outfile="" patient.refgene="" -exonsort="" patient.final.vcf="" humandb=""/> NOTICE: Output files were written to patient.refGene.variant_function, patient.refGene.exonic_variant_function NOTICE: Reading gene annotation from humandb/hg19_refGene.txt ... Done with 72212 transcripts (including 17527 without coding sequence annotation) for 28250 unique genes NOTICE: Variants with invalid input format were written to patient.refGene.invalid_input
But the second one is strange, it didn't find anything:
NOTICE: Processing operation=r protocol=cytoBand
NOTICE: Running with system command <annotate_variation.pl -regionanno="" -dbtype="" cytoband="" -buildver="" hg19="" -outfile="" patient_anno_v2="" patient.final.vcf="" humandb=""/> NOTICE: Output file is written to patient.hg19_cytoBand NOTICE: Reading annotation database humandb/hg19_cytoBand.txt ... Done with 862 regions NOTICE: Finished region-based annotation on 0 genetic variants NOTICE: Variants with invalid input format were written to patient.invalid_input
Please copy paste the command again, this time use the code formatting button after selecting the pasted content. Right now, the website has parsed out the content you pasted as though it were HTML, which it is not.
Thank you very much for your help! This is the command:
This is the output:
Can you also paste the output of
Head:
Tail:
Can you check the annotation file and the VCF file to see if the contig names match? That could be a problem.
Most of the snps are located on assembled chromosomes, and they are named like 'chr1' in both files. But it looks like there are no contiguous in the annotation file. Do you think I should try to remove them from vcf?
Didn't help :( Now tail for the patient_anno.invalid_input file looks like this:
Are you sure the reference sequence used in the VCF was hg19 and not hg38 or something else?
Yes. Moreover, I tried using another example from the manual:
And I see that in the temporary file myanno.refGene.exonic_variant_function there is gene annotation! It is in awful format, but it exists. And 'myanno.invalid_input' is an empty file. This is crazy :)
That's really odd. I'd recommend picking the first 1000 lines of your VCF and trying the annovar command you used earlier, tweaking it until it works. The
-protocol
and-operation
formats are pretty stupid, if you ask me.In the meantime, you can always use VEP to get the annotations you need.