Question: Filterout biallelic SNPs from multiple VCF files.
0
gravatar for nilakshafreezon
3.0 years ago by
Sri Lanka
nilakshafreezon110 wrote:

Hi, I have a list of vcf files, one per each individual with the variants called for HLA genes. HLA has a lot of multi allelic SNPs but in this case I need to filter out only the biallelic SNPs, scanning through all vcf files. Is there any specific tool for this?

Ex: sample 1 rs-xx G C sample 1 rs-yy T C sample 2 rs-xx A C sample 2 rs-yy TC. In this case I want to get only rs-yy as the result.

Thanks a lot in advance.

snp bi allelic hla vcf • 2.0k views
ADD COMMENTlink modified 3.0 years ago by Adam940 • written 3.0 years ago by nilakshafreezon110
2
gravatar for Garan
3.0 years ago by
Garan300
United Kingdom
Garan300 wrote:

GATK SelectVariants has a BIALLELIC filter flag:

java -Xmx2g -jar GenomeAnalysisTK.jar -T SelectVariants \
-R human_g1k_v37.fasta \
-o out_biallelic.vcf \
--variant in.vcf \
-restrictAllelesTo BIALLELIC

You could put together a bash script to loop through an array of sample vcf file names and output the seperate biallelic VCFs.

Something like:

batch=("sample_1" "sample_2" "sample_3") 
for sample in "${batch[@]}"
do  

     java -Xmx2g -jar GenomeAnalysisTK.jar -T SelectVariants \
     -R human_g1k_v37.fasta \
     -o ${sample}_out_biallelic.vcf \
     --variant ${sample}_in.vcf \
     -restrictAllelesTo BIALLELIC &
done
ADD COMMENTlink modified 3.0 years ago • written 3.0 years ago by Garan300
2
gravatar for Adam
3.0 years ago by
Adam940
United States
Adam940 wrote:

I suggest first merging your VCF files into a single file. Once this is done, there are multiple tools that could do the filtering you require (e.g. vcftools --max-alleles 2)

 

 

ADD COMMENTlink written 3.0 years ago by Adam940

Thanks a lot. Is merging over 500 samples feasible?

ADD REPLYlink written 3.0 years ago by nilakshafreezon110

Yes, although htslib is likely to be much faster for such a task.

ADD REPLYlink written 3.0 years ago by Adam940
0
gravatar for Prakki Rama
3.0 years ago by
Prakki Rama2.0k
Singapore
Prakki Rama2.0k wrote:

Is it something like this you want?

awk '/T    C       / {print $0}' file.vcf 

ADD COMMENTlink written 3.0 years ago by Prakki Rama2.0k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1670 users visited in the last hour