converting vcf to text file
1
0
Entering edit mode
6.5 years ago
AB ▴ 360

Hi,

I want to run Matrix EQTL on my data. I have the genotype information as an imputed VCF file and matrix EQTL requires a Snp.txt file. How can convert the vcf to a text file ??

Thanks, Apoorva

vcf eqtl Matrix EQTL • 11k views
ADD COMMENT
2
Entering edit mode

A vcf IS a text file. You have to show us what is the format of this 'snp.txt' file.

ADD REPLY
0
Entering edit mode

So this is the format of the snp.txt file as required by Matrix EQTL R package. The description only says the file should have the genotype information for the markers across all the samples

id  Sam_01  Sam_02  Sam_03  Sam_04  Sam_05  Sam_06  Sam_07  Sam_08  Sam_09  Sam_10  Sam_11  Sam_12  Sam_13  Sam_14  Sam_15  Sam_16
Snp_01  2   0   2   0   2   1   2   1   1   1   2   2   1   2   2   1
Snp_02  0   1   1   2   2   1   0   0   0   1   1   1   1   0   1   1
Snp_03  1   0   1   0   1   1   1   1   0   1   1   0   1   1   1   2
Snp_04  0   1   2   2   2   1   1   0   0   0   1   2   1   1   1   0
Snp_05  1   1   2   1   1   2   1   1   0   1   1   2   0   1   2   1
Snp_06  2   2   2   1   1   0   1   0   2   1   1   1   2   0   2   1
Snp_07  1   1   2   2   0   1   1   1   1   0   2   2   0   1   1   1
Snp_08  1   0   1   0   1   0   0   1   1   1   0   2   0   1   1   1
Snp_09  2   1   2   2   0   1   1   0   2   1   1   0   1   1   0   0
Snp_10  1   1   0   0   0   2   2   1   1   2   1   1   1   1   1   0
Snp_11  2   2   2   0   2   1   1   2   1   2   0   1   0   1   1   2
Snp_12  1   1   2   2   2   1   1   1   1   0   2   0   1   1   0   2
Snp_13  0   1   1   1   1   1   2   1   2   2   0   0   0   1   1   1
Snp_14  0   0   1   0   1   2   2   2   1   1   1   0   1   0   1   0
Snp_15  1   2   1   2   2   1   2   1   1   2   1   1   1   2   0   2

The vcf file I have is in this format.

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  P01497  P01509
1   10177   1:10177 A   AC  .   PASS    AF=0.42372;MAF=0.42372;R2=0.00241;AN=2;AC=0 GT:DS:GP    0|0:0.827:0.344,0.485,0.171 ./.:.:.
1   10235   1:10235 T   TA  .   PASS    AF=0.00119;MAF=0.00119;R2=5e-05;AN=2;AC=0   GT:DS:GP    0|0:0.002:0.998,0.002,0 ./.:.:.
1   10352   1:10352 T   TA  .   PASS    AF=0.43825;MAF=0.43825;R2=0.00238;AN=2;AC=0 GT:DS:GP    0|0:0.863:0.323,0.491,0.186 ./.:.:.
1   10539   1:10539 C   A   .   PASS    AF=0.0007;MAF=0.0007;R2=0.00315;AN=2;AC=0   GT:DS:GP    0|0:0.001:0.999,0.001,0 ./.:.:.
ADD REPLY
0
Entering edit mode

There are certain problems with your VCF and requested output.

For eg. for first record:

1   10177   1:10177 A   AC  .   PASS    AF=0.42372;MAF=0.42372;R2=0.00241;AN=2;AC=0 GT:DS:GP    0|0:0.827:0.344,0.485,0.171 ./.:.:.
  1. Reference allele is A and alternate allele AC. Both are different alleles. However for sample P01497, your GT 0|0. 0|0 means you have reference alleles on both chromosomes (maternal and paternal). I am not sure if this information is correct. Unless it is A to A,C (A in some and C in some. For this sample P01497, it is A)
  2. In desired matrix, you have only one Genotype information for a given SNP and given sample. But in your VCF, you have two genotypes. At this point, I assume that you need sample Alt allele in Sample. Please confirm.
ADD REPLY
1
Entering edit mode

In desired matrix, you have only one Genotype information for a given SNP

@cpad0112 I think it uses a integer notation for the genotypes: 0=HOM_REF, 1=HET, 2=HOM_VAR

ADD REPLY
0
Entering edit mode

Thanks @Pierre. Does that mean GT information in VCF needs to be reformatted to 0=HOM_REF, 1=HET, 2=HOM_VAR format (for each SNP /Sample) via a third party tool or code?

ADD REPLY
0
Entering edit mode

can you post your VCF headers as well here?

ADD REPLY
0
Entering edit mode

to convert vcf to tsv, use vcf2tsv from vcflib

ADD REPLY
0
Entering edit mode
6.5 years ago
AB ▴ 360

This post here has a very similar question

How to interpret and extract from a Vcf file Genotype informations as values

However, i just read in my VCF file into R using the vcfR package, converted it into a dataframe and manually changed the genotype information.

Also, as mentioned in one of the comments in that post, vcf-to-tab can be used to convert the vcf to a table and writing a script to convert.

ADD COMMENT

Login before adding your answer.

Traffic: 2365 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6