Adding fakes genotypes columns to a vcf file
1
0
Entering edit mode
2.8 years ago
xerigaj492 • 0

Hi, I would like to know if there is any tools out there that I can use to add 4 fake samples to a VCF ?

For example for each position the first individual would have 0/0, the second would have 0/1, the third 1/0 and the last 1/1 ?

So far my plan was to do the following 4 times (one for each individuals):

grep -v "^#" MyVCF.vcf | awk '{print $0 "\t" "0|0"}'

and I would have then add the individuals names in the header manually.. but I was wondering if any of you know a better way to do it or know a tools that can do such a thing ?

Thank you !!

bcftools vcftools vcf genotype • 598 views
ADD COMMENT
0
Entering edit mode
2.7 years ago
sbstevenlee ▴ 480

Not 100% sure if I understood your question, but here's a solution with Python API using the fuc package I wrote:

>>> from fuc import pyvcf
>>> data = {
...     'CHROM': ['chr1', 'chr2'],
...     'POS': [100, 101],
...     'ID': ['.', '.'],
...     'REF': ['G', 'T'],
...     'ALT': ['A', 'C'],
...     'QUAL': ['.', '.'],
...     'FILTER': ['.', '.'],
...     'INFO': ['.', '.'],
...     'FORMAT': ['GT', 'GT'],
...     'A': ['0/1', '1/1']
... }
>>> vf = pyvcf.VcfFrame.from_dict([], data)
>>> # vf = pyvcf.VcfFrame.from_file('your_vcf.vcf')
>>> vf.df
  CHROM  POS ID REF ALT QUAL FILTER INFO FORMAT    A
0  chr1  100  .   G   A    .      .    .     GT  0/1
1  chr2  101  .   T   C    .      .    .     GT  1/1
>>> vf.df['B'] = '0|0'
>>> vf.df['C'] = '0|1'
>>> vf.df['D'] = '1|0'
>>> vf.df['E'] = '1|1'
>>> vf.df
  CHROM  POS ID REF ALT QUAL FILTER INFO FORMAT    A    B    C    D    E
0  chr1  100  .   G   A    .      .    .     GT  0/1  0|0  0|1  1|0  1|1
1  chr2  101  .   T   C    .      .    .     GT  1/1  0|0  0|1  1|0  1|1
>>> vf.to_file('updated_vcf.vcf')
ADD COMMENT

Login before adding your answer.

Traffic: 2254 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6