How to populate a column with binary values, depending on whether a sample is present in another file - bash
2
1
Entering edit mode
2.4 years ago

I'm trying to populate the sixth column of a .fam file, as outlined here:

https://www.cog-genomics.org/plink/1.9/formats#fam

The column has to be binary ('1' or '2') depending on the case/control status. ('1' is control, '2' is case)

I have the .fam file with all of my samples, but the case/control column is currently just flagged as 'missing' (coded as '-9'). So the head of the .fam file looks like this:

<sample_1> <sample_1> 0 0 0 -9  
<sample_2> <sample_2> 0 0 0 -9  
<sample_3> <sample_3> 0 0 0 -9  
<sample_4> <sample_4> 0 0 0 -9  
<sample_5> <sample_5> 0 0 0 -9  
<sample_6> <sample_6> 0 0 0 -9  

I have a separate file with a list of samples that I know are 'case'. So these samples need to be coded as '2' in the sixth column, and all the rest of the samples need to therefore be coded as '1' in the sixth column.

Head of my 'case' samples file:

<sample_2>  
<sample_8>  
<sample_34>  
<sample_47>  
...etc

Is there a quick way to do this in bash?

bash merging plink • 607 views
ADD COMMENT
1
Entering edit mode
ADD REPLY
2
Entering edit mode
2.4 years ago
ATpoint 65k

Please check if the result is correct, assuming files are file1 and file2 with dummy data. Essentially you store file2 as a bash array and then simply loop through file1 and check if each sample is part of that array.

Array=$(cat file2 | tr "\n" " ")

while read p
  do
  if [[ "${Array[@]}" =~ $p ]]; then
    echo 2
  else
    echo 1
  fi
  done < <(awk 'FS=OFS="\t" {print $1}' file1) \
| paste <(cut -f1-5 file1) /dev/stdin


$cat file1 
sample1 sample1 0   0   0   9
sample2 sample2 0   0   0   9
sample3 sample3 0   0   0   9
sample4 sample4 0   0   0   9
sample5 sample5 0   0   0   9
sample6 sample6 0   0   0   9
sample7 sample7 0   0   0   9
sample8 sample8 0   0   0   9
sample9 sample9 0   0   0   9

$cat file2 
sample2
sample8

## Output of above code:
sample1 sample1 0   0   0   1
sample2 sample2 0   0   0   2
sample3 sample3 0   0   0   1
sample4 sample4 0   0   0   1
sample5 sample5 0   0   0   1
sample6 sample6 0   0   0   1
sample7 sample7 0   0   0   1
sample8 sample8 0   0   0   2
sample9 sample9 0   0   0   1
ADD COMMENT
1
Entering edit mode
2.4 years ago

Thanks. Took a while to work out, but the specific code I needed was plink --bfile <prefix> --make-bed --make-pheno <list_of_case_samples>.txt '*' -aec

ADD COMMENT

Login before adding your answer.

Traffic: 1217 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6