Entering edit mode
8.6 years ago
zwang10
▴
30
Hello! I have vcf.gz file, and I want to change it into 012 matrix. I use follow command
vcftools --gzvcf chr1.vcf.gz --out chr1 --012
Then, it outputs chr1.012, chr1.012.pos, and chr1.012.indv. But I found the length of each row of chr1.012 is not equal. And the length of rows of chr12.012 is not same as chr1.012.
Just to cover all bases, what are the commands you used to find
This is just so we are sure there was no error in the counting logic.
Hello! I use bash scripting. To print the length of each row, I use
To print the number of rows of chr1.012.pos, I use
Did you account for anomalies in separation? Maybe a case of multiple separators at places where it's not supposed to happen? Given the 012 file has an empty value indicator (-1), maybe squeeze the separator using a
tr -s
before theawk
?I found the length of chr1.012.pos is always much larger than the length of row of chr1.012. So the case you mentioned would not happen.
Maybe they use different separators?
EDIT: Scratch that - doesn't look like it; they're all tab separated. It has something to do with the actual variants then.
The real name of my vcf.gz file is
GAZ00001016581_1.ALSPAC.beagle.anno.csq.shapeit.20131101.vcf.gz
(this is for chromsome 1). Also there is a file called_EGAZ00001016604_ALSPAC.beagle.anno.csq.shapeit.20131101.sites.vcf.gz
. I do not know whether this file is useful or not.Do you have alternative tools to convert vcf.gz file into 012 matrix?
Not really. Sorry, I cannot help you with this now - I would have asked for a bit of the file to examine, but I am busy with my day job this week.
Sure. But my file is so large. The smallest file for chromsome is chr22 (about 4.1G).
Hello zwang10!
It appears that your post has been cross-posted to another site: http://stackoverflow.com/questions/36256693
This is typically not recommended as it runs the risk of annoying people in both communities.
Thanks for your suggestion. I deleted cross-posted one in stackoverflow.