012 genotype matrix using vcf tools
1
0
Entering edit mode
5.0 years ago
Ana ▴ 200

Hello everyone,

I have a vcf-file contains nearly 11millions SNPs. I want to convert my vcf file into 012 genotype matrix for LD pruning. I am using this code:

/data/programs/vcftools_0.1.13/bin/vcftools --vcf my.file.vcf
--012  --out output_geno.vcf


So, I get the output, but I am confused. According to manual the output 012 genotype matrix rows are individuals and columns are genotypes. I have 11million SNPs, should not get 11million columns (one columns per SNP)? when I count number of columns it is only nearly one million! Is there anything wrong or am I doing a ridiculous mistake? Thanks for any help to figure out my mistake ...

vcf tools genotype matrix columns • 6.2k views
0
Entering edit mode

Did you check the *.indiv and *.pos files that are also output with the --012 parameter? The *.indiv file should obviously cotain the expected number of samples that were in the input VCF.

Also, check the log file that's produced, particularly the line:

"After filtering, kept X out of a possible Y Sites"

Kevin

0
Entering edit mode

How did you count the columns?

If you do something like this:

head -n 1 file.012  |  awk '{print NF}'


Do you have the right number of columns?

0
Entering edit mode
20 months ago
Jautis ▴ 510

The 012 output is n-by-features, rather than features-by-n. That means that each row is an individual and each column is a snp.