Plink: Number of Samples reducing after merging multiple bed,bim,fam files with Plink's --merge-list function
0
0
Entering edit mode
15 days ago
Swetaleena • 0

I have multiple BED/BIM/FAM files (from separate GWAS experiments) and I wish to create a merged set of these files. I am using the command:

plink --bfile myfile1 --merge-list all_my_files.txt --make-bed --out mymerged


my_files.txt has the format:

file2.bed file2.bim file2.fam
...
fileK.bed fileK.bim fileK.fam


The problem is I am merging the files for about 3000 samples (divided into about 34 GWAS runs). When I do a line count (wc -l ) for the resulting fam file from the merged exercise, I get about 2400. I want to know why is the information for almost 600 samples lost?

--merge-list fam_file Plink • 144 views
0
Entering edit mode

You might have duplicated IDs in your file*.fam?

How many unique lines do you get when you run

cat file*.fam | awk '{print $1,$2}' | sort -u | wc -l

0
Entering edit mode

When I run this command for my merged.fam file, I get 2480. But originally, I had 34 bed,bim and fam files for 3061 samples which I merged to get the merged.bim, merged.fam and merged.bed files.

0
Entering edit mode

Also when I run this command in the folder having all my 34 separate .fam files, I get 2957. That means my merged.fam doesnt have the information for almost 500 samples.