How to combine variants??
1
0
Entering edit mode
7.5 years ago
SOHAIL ▴ 400

Hi,

I have two bi-allelic variant files, My objective is to combine all the genotypes/samples for those sites that are common in both files?

Can someone please mention any tool and steps how to do that??

Thanks

ngs variant manioulation • 2.4k views
ADD COMMENT
0
Entering edit mode

It's unclear to me what is common between both files. Are the same variants in both files or the same samples?

ADD REPLY
0
Entering edit mode

Hi WouterDeCoster, Given: Two different files 1. 1000G Bi-allelic SNPs 2. My sample Bi-allelic SNPs

Problem:

    1. Collect only those variants that intersect between those samples ( i mean output those sites that common in both),
        Result: two files  (1.) Intersect variants of 1000G with 1000G Genotype information (2.) My samples with same variants of my own sample genotypes.

  2. Combine those same variants into single VCF file, with same sites and union of all samples.

In short, common variants in start and then the union of all samples. Thanks!

ADD REPLY
0
Entering edit mode
7.5 years ago

I would solve problem one by generating identifiers for your variants (preferably in the smallest file) by concatenating chromosome, position and alternative allele. You can use those identifiers to filter the second file.

e.g.:

#Get the identifiers present in yourfile.vcf
bcftools annotate --set-id '%CHROM\_%POS\_%ALT' yourfile.vcf | cut -f3 > MyIdentifiers.txt

#Give the same type of identifiers to the 1000G data vcf
bcftools annotate --set-id '%CHROM\_%POS\_%ALT' 1000Gdata.vcf > 1000Gdata_withidentifiers.vcf

#Filter the 1000G data to only contain the variants you have in your vcf
java -jar GenomeAnalysisTK.jar -R ref.fasta -T SelectVariants --variant 1000Gdata_withidentifiers.vcf -o 1000G_myvariants.vcf -IDs MyIdentifiers.txt

Problem two can probably easily be solved by using something like vcf-merge from vcftools

ADD COMMENT

Login before adding your answer.

Traffic: 2025 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6