Question: Split Vcf File Into Snps And Indels
9
gravatar for pablo.riesgo
6.9 years ago by
pablo.riesgo140
pablo.riesgo140 wrote:

Hi there,

As recommended in the GATK best practices the Variant Quality Score Recalibration has to be done separately for SNPs and Indels. But, I didn't find the way to do this split in a clean way (for instance vcftools). Does anybody know a tool to do this?

I already found a script that does the trick but I am surprised that this functionality is not included in the usual tools for processing VCF files.

The script in case it helps: http://ngsda.blogspot.com.es/2011/06/awk-script-to-seperate-snp-and-indel.html

Thanks! Pablo.

vcf indel split snp gatk • 12k views
ADD COMMENTlink written 6.9 years ago by pablo.riesgo140
15
gravatar for Giovanni M Dall'Olio
6.9 years ago by
London, UK
Giovanni M Dall'Olio26k wrote:

The most recent versions of vcftools have an option to include o remove indels.

From http://vcftools.sourceforge.net/options.html#site_filter :

--keep-only-indels
--remove-indels

Include or exclude sites that contain an indel. For this option 'indel' means any variant that alters the length of the REF allele.

This functionality is relatively new, so if can't use these options on your computer, it means that you are using an old version of vcftools.

ADD COMMENTlink modified 6.9 years ago • written 6.9 years ago by Giovanni M Dall'Olio26k

Hey, I'm facing problems with --remove-indels. Though I have the latest version of vcftools (vcftools_0.1.12a.tar.gz) installed, I get an error.

Command:

vcftools --gzvcf LR1_sorted_snp.vcf.gz --remove-indels --recode --recode-INFO-all --out LR1_SNP_ONLY

Error: Unknown option: --remove-indels

Can you help me out?

Thank you!

ADD REPLYlink written 5.0 years ago by Parimala Devi70
1
gravatar for pablo.riesgo
6.9 years ago by
pablo.riesgo140
pablo.riesgo140 wrote:

Thanks!!

It suits perfectly my needs. I was updated but I had missed this bit of the documentation.

This command creates a new VCF file keeping only indels and leaving the INFO field untouched: vcftools --vcf X.vcf --keep-only-indels --out X.indel --recode --recode-INFO-all

Regarding the python script I posted it does not work well in case of having many SNPs at the same position. REF=A ALT=C,G is recognised as an indel while it is actually two SNPs.

Pablo.

ADD COMMENTlink modified 6.9 years ago • written 6.9 years ago by pablo.riesgo140
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 809 users visited in the last hour