Question: Convert INDEL format for use in Annovar
0
gravatar for User 7754
3.2 years ago by
User 7754230
United Kingdom
User 7754230 wrote:

Hi,

I have a tab delimited file with GWAS results and I am trying to annotate the variants using Annovar but my format for INDELS is different, and I get my indels split between the invalid_input, where I find these SNPs in my file:

1       63735   63735   CCTA    C
1       251627  251627  AC      A      
1       760811  760811  CTCTT   C

While they should be like this:

1   63735   CCTA    4C  0.339   rs201888535
1   251627  AC  2A  0.172   rs72502741
1   760811  CTCTT   5C  0.0417  rs200712425

and some in the "filtered" file like this:

1   36549207    36549207    A   ACT

which are in Annovar like this:

1   36549207    A   0CTC    0.9076  rs143406521
1   36549207    A   1ACTC   0.9076  rs143406521

I am wondering what is the best way to convert these formats, and if there is a standard way/a script to do this as I am afraid to get it wrong and convert only insertions and not deletions or the opposite... thanks so much for your help.

ADD COMMENTlink modified 3.2 years ago by igor8.9k • written 3.2 years ago by User 7754230
1
gravatar for igor
3.2 years ago by
igor8.9k
United States
igor8.9k wrote:

ANNOVAR website provides instructions on preparing VCFs:

So as a user, this is what you should do: (1) split VCF lines so that each line contains one and only one variant (2) left-normalize all VCF lines (3) annotate by ANNOVAR.

For example, suppose the input is ex1.vcf.gz (make sure that it is processed by bgzip and then by tabix), this is what you would do:

bcftools norm -m-both -o ex1.step1.vcf ex1.vcf.gz

bcftools norm -f human_g1k_v37.fasta -o ex1.step2.vcf ex1.step1.vcf

The first command split multi-allelic variants calls into separate lines, yet the second command perform the actual left-normalization. The FASTA file is needed in the second command.

Source: http://annovar.openbioinformatics.org/en/latest/articles/VCF/

ADD COMMENTlink modified 3.2 years ago • written 3.2 years ago by igor8.9k

Hi, thank you Igor for your reply, but I don't have a VCF file only a simple text file so I can't really use the utilities for VCF right? Maybe I can anyway? Thanks again

ADD REPLYlink written 3.2 years ago by User 7754230
1

I didn't realize you don't have a VCF at all. Not sure how you can do this with a custom format. My first suggestion would be to create a VCF file, but I am not sure how easy it would be, especially with indels. Some options here: bed to vcf format conversion

ADD REPLYlink written 3.2 years ago by igor8.9k

Great! this would work, thank you!!

ADD REPLYlink written 3.2 years ago by User 7754230

Sorry @igor I tried your tip for using ANNOVAR about INDELS but I am getting this error

[fi1d18@cyan01 annovar]$ bcftools norm -f hs37d5.fa -o ex1.step2.vcf ex1.step1.vcf                              [fai_fetch_seq] The sequence "chr1" not found
faidx_fetch_seq failed at chr1:1499775
[fi1d18@cyan01 annovar]$

But I don't know what does that mean

ADD REPLYlink written 4 days ago by A3.6k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1104 users visited in the last hour