I have a Beagle phased output and I want to compare consecutive columns of a file and return the number of matched elements. I would prefer to use shell scripting or awk. Here is a sample bash/AWK script that I am trying to use.

!/bin/bash for i in 3 4 5 6 7 8 9 do for j in 3 4 5 6 7 8 9`do`

`awk "$i == $j" phased.txt | wc -l`

`done`

done

I have a file of size 147189*828 and I want to compare each columns and return the number of matched elements in a 828*828 matrix(A similarity matrix). This would be fairly easy in MATLAB, but, it takes a long time to load huge files. I can compare two columns and return the number of matched elements with the following awk command: awk '$3==$4' phased.txt | wc -l, but would need some help to do it for the entire file.

A snippet of the data:

# sampleID HGDP00511 HGDP00511 HGDP00512 HGDP00512 HGDP00513 HGDP00513

M rs4124251 0 0 A G 0 A

M rs6650104 0 A C T 0 0

M rs12184279 0 0 G A T 0

...........................................................................................................................................................

Always show a snippet of data, as I have no idea what a phased beagle file is, but I can help you with comparison.

Hi Sukhdeep,

Thanks for reaching out. I have posted a snippet of the sample data. Your help is much appreciated.