How to subset vcf by sample and write?
2
1
Entering edit mode
2.7 years ago

Hi. I have a vcf file containing variants information of multiple samples. How can I subset it by sample ID and write and save them separately?

bcftools vcf • 1.5k views
ADD COMMENT
0
Entering edit mode
ADD REPLY
1
Entering edit mode
2.7 years ago
samuelandjw ▴ 250

If you want to only subset several samples, try:

bcftools view combined.vcf.gz -s sample_ID -Oz > sample_ID.vcf.gz

If you want to subset every sample, try:

bcftools +split input.vcf.gz -Oz -o vcf_per_sample
ADD COMMENT
0
Entering edit mode
2.7 years ago
sbstevenlee ▴ 480

Here's a Python API solution using the pyvcf submodule I wrote:

>>> from fuc import pyvcf
>>> data = {
...     'CHROM': ['chr1', 'chr1'],
...     'POS': [100, 101],
...     'ID': ['.', '.'],
...     'REF': ['G', 'T'],
...     'ALT': ['A', 'C'],
...     'QUAL': ['.', '.'],
...     'FILTER': ['.', '.'],
...     'INFO': ['.', '.'],
...     'FORMAT': ['GT:DP', 'GT:DP'],
...     'A': ['0/1:30', '0/1:29'],
...     'B': ['0/1:24', '0/1:30'],
...     'C': ['0/1:18', '0/1:24'],
... }
>>> vf = pyvcf.VcfFrame.from_dict([], data)
>>> # vf = pyvcf.VcfFrame.from_file('in.vcf')
>>> vf.df
  CHROM  POS ID REF ALT QUAL FILTER INFO FORMAT       A       B       C
0  chr1  100  .   G   A    .      .    .  GT:DP  0/1:30  0/1:24  0/1:18
1  chr1  101  .   T   C    .      .    .  GT:DP  0/1:29  0/1:30  0/1:24
>>> a_vf = vf.subset('A')
>>> a_vf.df
  CHROM  POS ID REF ALT QUAL FILTER INFO FORMAT       A
0  chr1  100  .   G   A    .      .    .  GT:DP  0/1:30
1  chr1  101  .   T   C    .      .    .  GT:DP  0/1:29
>>> a_vf.to_file('A.vcf')
ADD COMMENT

Login before adding your answer.

Traffic: 2594 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6