Question: converting vcf to text file
0
gravatar for Apoorva
3.0 years ago by
Apoorva260
United States
Apoorva260 wrote:

Hi,

I want to run Matrix EQTL on my data. I have the genotype information as an imputed VCF file and matrix EQTL requires a Snp.txt file. How can convert the vcf to a text file ??

Thanks, Apoorva

eqtl matrix eqtl vcf • 5.4k views
ADD COMMENTlink modified 20 months ago by Biostar ♦♦ 20 • written 3.0 years ago by Apoorva260
2

A vcf IS a text file. You have to show us what is the format of this 'snp.txt' file.

ADD REPLYlink written 3.0 years ago by Pierre Lindenbaum131k

So this is the format of the snp.txt file as required by Matrix EQTL R package. The description only says the file should have the genotype information for the markers across all the samples

id  Sam_01  Sam_02  Sam_03  Sam_04  Sam_05  Sam_06  Sam_07  Sam_08  Sam_09  Sam_10  Sam_11  Sam_12  Sam_13  Sam_14  Sam_15  Sam_16
Snp_01  2   0   2   0   2   1   2   1   1   1   2   2   1   2   2   1
Snp_02  0   1   1   2   2   1   0   0   0   1   1   1   1   0   1   1
Snp_03  1   0   1   0   1   1   1   1   0   1   1   0   1   1   1   2
Snp_04  0   1   2   2   2   1   1   0   0   0   1   2   1   1   1   0
Snp_05  1   1   2   1   1   2   1   1   0   1   1   2   0   1   2   1
Snp_06  2   2   2   1   1   0   1   0   2   1   1   1   2   0   2   1
Snp_07  1   1   2   2   0   1   1   1   1   0   2   2   0   1   1   1
Snp_08  1   0   1   0   1   0   0   1   1   1   0   2   0   1   1   1
Snp_09  2   1   2   2   0   1   1   0   2   1   1   0   1   1   0   0
Snp_10  1   1   0   0   0   2   2   1   1   2   1   1   1   1   1   0
Snp_11  2   2   2   0   2   1   1   2   1   2   0   1   0   1   1   2
Snp_12  1   1   2   2   2   1   1   1   1   0   2   0   1   1   0   2
Snp_13  0   1   1   1   1   1   2   1   2   2   0   0   0   1   1   1
Snp_14  0   0   1   0   1   2   2   2   1   1   1   0   1   0   1   0
Snp_15  1   2   1   2   2   1   2   1   1   2   1   1   1   2   0   2

The vcf file I have is in this format.

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  P01497  P01509
1   10177   1:10177 A   AC  .   PASS    AF=0.42372;MAF=0.42372;R2=0.00241;AN=2;AC=0 GT:DS:GP    0|0:0.827:0.344,0.485,0.171 ./.:.:.
1   10235   1:10235 T   TA  .   PASS    AF=0.00119;MAF=0.00119;R2=5e-05;AN=2;AC=0   GT:DS:GP    0|0:0.002:0.998,0.002,0 ./.:.:.
1   10352   1:10352 T   TA  .   PASS    AF=0.43825;MAF=0.43825;R2=0.00238;AN=2;AC=0 GT:DS:GP    0|0:0.863:0.323,0.491,0.186 ./.:.:.
1   10539   1:10539 C   A   .   PASS    AF=0.0007;MAF=0.0007;R2=0.00315;AN=2;AC=0   GT:DS:GP    0|0:0.001:0.999,0.001,0 ./.:.:.
ADD REPLYlink modified 3.0 years ago by Pierre Lindenbaum131k • written 3.0 years ago by Apoorva260

There are certain problems with your VCF and requested output.

For eg. for first record:

1   10177   1:10177 A   AC  .   PASS    AF=0.42372;MAF=0.42372;R2=0.00241;AN=2;AC=0 GT:DS:GP    0|0:0.827:0.344,0.485,0.171 ./.:.:.
  1. Reference allele is A and alternate allele AC. Both are different alleles. However for sample P01497, your GT 0|0. 0|0 means you have reference alleles on both chromosomes (maternal and paternal). I am not sure if this information is correct. Unless it is A to A,C (A in some and C in some. For this sample P01497, it is A)
  2. In desired matrix, you have only one Genotype information for a given SNP and given sample. But in your VCF, you have two genotypes. At this point, I assume that you need sample Alt allele in Sample. Please confirm.
ADD REPLYlink written 3.0 years ago by cpad011214k
1

In desired matrix, you have only one Genotype information for a given SNP

@cpad0112 I think it uses a integer notation for the genotypes: 0=HOM_REF, 1=HET, 2=HOM_VAR

ADD REPLYlink written 3.0 years ago by Pierre Lindenbaum131k

Thanks @Pierre. Does that mean GT information in VCF needs to be reformatted to 0=HOM_REF, 1=HET, 2=HOM_VAR format (for each SNP /Sample) via a third party tool or code?

ADD REPLYlink modified 3.0 years ago • written 3.0 years ago by cpad011214k

can you post your VCF headers as well here?

ADD REPLYlink written 3.0 years ago by cpad011214k

to convert vcf to tsv, use vcf2tsv from vcflib

ADD REPLYlink written 3.0 years ago by cpad011214k
0
gravatar for Apoorva
3.0 years ago by
Apoorva260
United States
Apoorva260 wrote:

This post here has a very similar question

How to interpret and extract from a Vcf file Genotype informations as values

However, i just read in my VCF file into R using the vcfR package, converted it into a dataframe and manually changed the genotype information.

Also, as mentioned in one of the comments in that post, vcf-to-tab can be used to convert the vcf to a table and writing a script to convert.

ADD COMMENTlink modified 3.0 years ago • written 3.0 years ago by Apoorva260
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2038 users visited in the last hour