How to subset vcf by sample and write?
2
1
Entering edit mode
12 months ago

Hi. I have a vcf file containing variants information of multiple samples. How can I subset it by sample ID and write and save them separately?

bcftools vcf • 615 views
ADD COMMENT
0
Entering edit mode
ADD REPLY
1
Entering edit mode
12 months ago
samuelandjw ▴ 230

If you want to only subset several samples, try:

bcftools view combined.vcf.gz -s sample_ID -Oz > sample_ID.vcf.gz

If you want to subset every sample, try:

bcftools +split input.vcf.gz -Oz -o vcf_per_sample
ADD COMMENT
0
Entering edit mode
12 months ago
sbstevenlee ▴ 470

Here's a Python API solution using the pyvcf submodule I wrote:

>>> from fuc import pyvcf
>>> data = {
...     'CHROM': ['chr1', 'chr1'],
...     'POS': [100, 101],
...     'ID': ['.', '.'],
...     'REF': ['G', 'T'],
...     'ALT': ['A', 'C'],
...     'QUAL': ['.', '.'],
...     'FILTER': ['.', '.'],
...     'INFO': ['.', '.'],
...     'FORMAT': ['GT:DP', 'GT:DP'],
...     'A': ['0/1:30', '0/1:29'],
...     'B': ['0/1:24', '0/1:30'],
...     'C': ['0/1:18', '0/1:24'],
... }
>>> vf = pyvcf.VcfFrame.from_dict([], data)
>>> # vf = pyvcf.VcfFrame.from_file('in.vcf')
>>> vf.df
  CHROM  POS ID REF ALT QUAL FILTER INFO FORMAT       A       B       C
0  chr1  100  .   G   A    .      .    .  GT:DP  0/1:30  0/1:24  0/1:18
1  chr1  101  .   T   C    .      .    .  GT:DP  0/1:29  0/1:30  0/1:24
>>> a_vf = vf.subset('A')
>>> a_vf.df
  CHROM  POS ID REF ALT QUAL FILTER INFO FORMAT       A
0  chr1  100  .   G   A    .      .    .  GT:DP  0/1:30
1  chr1  101  .   T   C    .      .    .  GT:DP  0/1:29
>>> a_vf.to_file('A.vcf')
ADD COMMENT

Login before adding your answer.

Traffic: 929 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6