extracting genotypes from a multi-sample VCF that have certain variants
0
0
Entering edit mode
8.1 years ago
Floydian_slip ▴ 170

Hi, I have a set of variants and a multi-sample merged VCF that indicates the genotype for each sample. Is there a way to extract the sample names that haver those variants? Ideally, I am looking to do this at each variant: variant followed by the names of the samples that have that variant.

Thanks a lot in advance! ~N

vcf genotypes • 2.8k views
ADD COMMENT
0
Entering edit mode

it's not clear to me where you're looking for genotype (sample,A1,A2) and variant (chrom/pos/ref/alts), what are your inputs...

ADD REPLY
0
Entering edit mode

I have 2 inputs: 1. a vcf file with a set of variants. 2. Another merged VCF file from multiple individuals that indicates for each variant what is the genotype (present, absent, etc).

Now, all the individual may not have the variants from the first file. What I would like to know is which samples have each of the variants from the first file. Eg., variant1 from file1 is present in these samples from file2.
I hope that is clear.

ADD REPLY
1
Entering edit mode

So, I figured out a way: first, I can used betools intersect the two files to get only those lines in the multi-sample merged VCF file that contains the variants that I want information for. Next, from the resultant file, I can easily parse the columns corresponding to the genoptypes of each sample and extract only those column headings (and hence the sample names) that have that variant (0/1 or 1/2 meaning that they have that variant in some form) using awk, cut, etc.

Thanks!

ADD REPLY

Login before adding your answer.

Traffic: 2515 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6