Question: Difference between a VCF file and a "genotype matrix" ?
1
gravatar for stevenlang123
4.0 years ago by
stevenlang123150
United States
stevenlang123150 wrote:

I'm using NGS data to run a program that asks for a "genotype matrix" of samples and SNPs. Is this just the same as a VCF file ?

sequencing snp ngs • 2.8k views
ADD COMMENTlink modified 4.0 years ago by Jeremy Leipzig18k • written 4.0 years ago by stevenlang123150
3
gravatar for Jeremy Leipzig
4.0 years ago by
Philadelphia, PA
Jeremy Leipzig18k wrote:

Usually it is a representation of the sample GT calls in the VCF file that represents alleles at that position (i.e. 0: homozygous ref, 1: heterozygous alt, 2: homozygous alt)

snp                       sample1     sample2     sample3
1:2348932A>C                    0           1           2

VCF is the right starting point. In R:

library("VariantAnnotation")
fl <- system.file("extdata", "ex2.vcf", package="VariantAnnotation")
vcf <- readVcf(fl, "hg19")
mat <- genotypeToSnpMatrix(vcf)
t(as(mat$genotype, "character"))
ADD COMMENTlink modified 4.0 years ago • written 4.0 years ago by Jeremy Leipzig18k

I see. Do you know of any tools that can create such a file from a set of .BAMs?  The exact specification for the file I need to create is the following: 1st column: gene name; 2nd column: snp name; 3rd-end columns: A matrix of genotypes for each subject (class: data.frame). The order of 3rd-end columns should match id. Coded as 0, 1, 2 and no missing. 

Thanks in advance

ADD REPLYlink modified 4.0 years ago • written 4.0 years ago by stevenlang123150

Awesome! Thank you very much for your help

ADD REPLYlink written 4.0 years ago by stevenlang123150
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1569 users visited in the last hour