How to populate a column with binary values, depending on whether a sample is present in another file - bash
2
1
Entering edit mode
2.4 years ago

I'm trying to populate the sixth column of a .fam file, as outlined here:

The column has to be binary ('1' or '2') depending on the case/control status. ('1' is control, '2' is case)

I have the .fam file with all of my samples, but the case/control column is currently just flagged as 'missing' (coded as '-9'). So the head of the .fam file looks like this:

<sample_1> <sample_1> 0 0 0 -9
<sample_2> <sample_2> 0 0 0 -9
<sample_3> <sample_3> 0 0 0 -9
<sample_4> <sample_4> 0 0 0 -9
<sample_5> <sample_5> 0 0 0 -9
<sample_6> <sample_6> 0 0 0 -9


I have a separate file with a list of samples that I know are 'case'. So these samples need to be coded as '2' in the sixth column, and all the rest of the samples need to therefore be coded as '1' in the sixth column.

Head of my 'case' samples file:

<sample_2>
<sample_8>
<sample_34>
<sample_47>
...etc


Is there a quick way to do this in bash?

bash merging plink • 607 views
1
Entering edit mode

Related post:

2
Entering edit mode
2.4 years ago
ATpoint 65k

Please check if the result is correct, assuming files are file1 and file2 with dummy data. Essentially you store file2 as a bash array and then simply loop through file1 and check if each sample is part of that array.

Array=$(cat file2 | tr "\n" " ") while read p do if [[ "${Array[@]}" =~ $p ]]; then echo 2 else echo 1 fi done < <(awk 'FS=OFS="\t" {print$1}' file1) \
| paste <(cut -f1-5 file1) /dev/stdin

$cat file1 sample1 sample1 0 0 0 9 sample2 sample2 0 0 0 9 sample3 sample3 0 0 0 9 sample4 sample4 0 0 0 9 sample5 sample5 0 0 0 9 sample6 sample6 0 0 0 9 sample7 sample7 0 0 0 9 sample8 sample8 0 0 0 9 sample9 sample9 0 0 0 9$cat file2
sample2
sample8

## Output of above code:
sample1 sample1 0   0   0   1
sample2 sample2 0   0   0   2
sample3 sample3 0   0   0   1
sample4 sample4 0   0   0   1
sample5 sample5 0   0   0   1
sample6 sample6 0   0   0   1
sample7 sample7 0   0   0   1
sample8 sample8 0   0   0   2
sample9 sample9 0   0   0   1

1
Entering edit mode
2.4 years ago

Thanks. Took a while to work out, but the specific code I needed was plink --bfile <prefix> --make-bed --make-pheno <list_of_case_samples>.txt '*' -aec