Split Vcf File Into Snps And Indels
3
9
Entering edit mode
10.4 years ago
pablo.riesgo ▴ 140

Hi there,

As recommended in the GATK best practices the Variant Quality Score Recalibration has to be done separately for SNPs and Indels. But, I didn't find the way to do this split in a clean way (for instance vcftools). Does anybody know a tool to do this?

I already found a script that does the trick but I am surprised that this functionality is not included in the usual tools for processing VCF files.

The script in case it helps: http://ngsda.blogspot.com.es/2011/06/awk-script-to-seperate-snp-and-indel.html

Thanks! Pablo.

split vcf snp indel gatk • 18k views
0
Entering edit mode

As an update, in my use cases, VCFtools aren't able to process my vcf files and will report that there are some errors. Specifically, the error is because Polyploidy was found, and it wasn't currently supported by vcftools.

0
Entering edit mode

Please do not add an answer unless it answers the top level post. This post is better suited as a comment, and I am moving it to one.

15
Entering edit mode
10.4 years ago

The most recent versions of vcftools have an option to include o remove indels.

--keep-only-indels
--remove-indels

Include or exclude sites that contain an indel. For this option 'indel' means any variant that alters the length of the REF allele.


This functionality is relatively new, so if can't use these options on your computer, it means that you are using an old version of vcftools.

0
Entering edit mode

Hey, I'm facing problems with --remove-indels. Though I have the latest version of vcftools (vcftools_0.1.12a.tar.gz) installed, I get an error.

Command:

vcftools --gzvcf LR1_sorted_snp.vcf.gz --remove-indels --recode --recode-INFO-all --out LR1_SNP_ONLY
Error: Unknown option: --remove-indels


Can you help me out?

Thank you!

1
Entering edit mode
10.4 years ago
pablo.riesgo ▴ 140

Thanks!!

It suits perfectly my needs. I was updated but I had missed this bit of the documentation.

This command creates a new VCF file keeping only indels and leaving the INFO field untouched:

vcftools --vcf X.vcf --keep-only-indels --out X.indel --recode --recode-INFO-all


Regarding the python script I posted it does not work well in case of having many SNPs at the same position. REF=A ALT=C,G is recognised as an indel while it is actually two SNPs.

Pablo.