Question

Compare consecutive columns of a phased Beagle file to generate the number of elements that matches.

0

Entering edit mode

10.4 years ago

aritra90 ▴ 70

I have a Beagle phased output and I want to compare consecutive columns of a file and return the number of matched elements. I would prefer to use shell scripting or awk. Here is a sample bash/AWK script that I am trying to use.

!/bin/bash
for i in 3 4 5 6 7 8 9
do
  for j in 3 4 5 6 7 8 9
   do
    awk "$i == $j" phased.txt | wc -l
  done
done

I have a file of size 147189828 and I want to compare each columns and return the number of matched elements in a 828\828 matrix (A similarity matrix). This would be fairly easy in MATLAB, but, it takes a long time to load huge files. I can compare two columns and return the number of matched elements with the following awk command: awk '$3==$4' phased.txt | wc -l, but would need some help to do it for the entire file.

A snippet of the data:

# sampleID   HGDP00511  HGDP00511   HGDP00512   HGDP00512   HGDP00513   HGDP00513
M rs4124251       0                     0                      A                     G                  0                        A
M rs6650104       0                     A                      C                     T                  0                        0
M rs12184279      0                    0                      G                      A                 T                        0
..
..

beagle bash awk shell • 3.1k views

ADD COMMENT • link updated 2.8 years ago by Ram 45k • written 10.4 years ago by aritra90 ▴ 70

0

Entering edit mode

Always show a snippet of data, as I have no idea what a phased beagle file is, but I can help you with comparison.

ADD REPLY • link updated 2.8 years ago by Ram 45k • written 10.4 years ago by Sukhi Singh 11k

0

Entering edit mode

Hi Sukhdeep,

Thanks for reaching out. I have posted a snippet of the sample data. Your help is much appreciated.

ADD REPLY • link updated 2.8 years ago by Ram 45k • written 10.4 years ago by aritra90 ▴ 70

Ram · Answer 1 · 2015-05-30

0

Entering edit mode

10.4 years ago

aritra90 ▴ 70

SOLVED.

I was missing the $$

Thanks :)

ADD COMMENT • link updated 2.8 years ago by Ram 45k • written 10.4 years ago by aritra90 ▴ 70