I would like to find the count of homozygous major(0|0),homozygous minor(1|1) and heterozygous(1|0,0|1) alleles for each position of a chromosome.I would like to form a table like this:
POS 00 1 0/0 1 11
Pos1 n1 n2 n3
Pos2 n4 n5 n6
where n1...n6 are the counts .
See the example file here 1000Genomes
Any suggestions to try to get this information would be much appreciated
Thanks in Advance
Thanks a lot finswimmer .It works.
I am sorry this seems to give me a wrong solution
Why do you think so?
I get the the count of 0|0 right. But when i manually calculated it gives me the wrong count for 0|1,1|0 and 1|1
Could you please post a small example of your file and the output you get and which you would expect?
Sure. Here is the example :
Position, Count_0|0, Count_0|1, Count_1|0 ,Count_1|1
16050115, 404, 0 , 0 , 0
16050213, 403, 0 , 1 , 0
16050607 400 , 2 , 2 , 0
Hello aadhirareddy1323 ,
what I meant was an example of your input file.
fin swimmer
Hi fin swimmer, here is the file
ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/ALL.chr22.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz
OK, thanks. Mistake found. There was in
else
missing here:if($i ~ /1\|1/) hom_alt++;
. It must beelse if
. I corrected it in my answer.fin swimmer
Thank you fin swimmer . I will try this
@fin swimmer ..This gives me wrong count again
I get the count of home_ref and alt right. The problem with het is I want to only sum the ones with 0|1 and 1|0. but it gives me the sum of 0|1,1|0,0|2,etc.. I am sorry I havent mentioned it before .How do i proceed ? I am new to linux
That wasn't clear to me, that you want to differ the heterozygous. You just have to modify my code in that way, that you define a new variable for each genotype you like to count, check with in if-statement if it exist, count and output all the variables.
So for checking the genotype
0|1
and1|0
it looks like this:Why do want to make a difference between
0|1
and1|0
?fin swimmer
@finswimmer,I dont want to make the difference , but want to exclude 0|2,0|3 etc and keep only count(1|0 and 0|1) for heterozygous.