Query reagrding Plink merging
0
0
Entering edit mode
2.1 years ago
ramshahaya ▴ 10

Hi Biostars Community,

In my case, I had used GATK HaplotypeCaller to call variants. I had used SNP_Hard_Filtered_VCF file obtained using GATK. In my previous queries, I had issues regarding multi-allelic during plink merging. To solve that I had followed given below steps

I had followed these steps (1 and 2) for 144 Samples. I mean, I had run 1 and 2 commands on individual sample vcf files.

1. Convert VCF format to Plink format

bcftools norm -Ou -m -any HF_PASS_SNPs.vcf.gz | \
  bcftools norm -Ou -f Bos_taurus_Ensembl_UMD3.1/genome.fa | \
  bcftools annotate -Ob -x ID -I +'%CHROM:%POS:%REF:%ALT' | \
  /usr/bin/plink1.9 --bcf /dev/stdin --keep-allele-order -cow --allow-no-sex --nonfounders --make-bed --out HF_PASS_SNPs_plink

This step (above command) has been suggested in this link to convert VCF to plink format

http://apol1.blogspot.com/2014/11/best-practice-for-converting-vcf-files.html

2. Then I had performed QC steps

/usr/bin/plink1.9 \
  --bfile HF_PASS_SNPs_plink \
  --cow \
  --allow-no-sex \
  --nonfounders \
  --keep-allele-order \
  --mind 0.1 \
  --geno 0.1 \
  --maf 0.05 \
  --make-bed \
  --out HF_PASS_SNPs_plink_QC

3. Then merge 144 Samples

/usr/bin/plink1.9 --cow --make-bed --merge-list myFile.txt --out mymerged_144
PLINK v1.90b6.22 64-bit (3 Nov 2020)           www.cog-genomics.org/plink/1.9/
(C) 2005-2020 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to mymerged_144.log.
Options in effect:
  --cow
  --make-bed
  --merge-list myFile.txt
  --out mymerged_144
64245 MB RAM detected; reserving 32122 MB for main workspace.
Warning: Variants '1:21444:A:G' and '1:21444:A:*' have the same position.
Warning: Variants '1:21446:C:G' and '1:21446:C:*' have the same position.
Warning: Variants '1:21448:T:C' and '1:21448:T:*' have the same position.
7955 more same-position warnings: see log file.
Performing single-pass merge (138 cattle, 342592 variants).
Merged fileset written to mymerged_144-merge.bed + mymerged_144-merge.bim +
mymerged_144-merge.fam .
342592 variants loaded from .bim file.
138 cattle (0 males, 0 females, 138 ambiguous) loaded from .fam.
Ambiguous sex IDs written to mymerged_144.nosex .
Using 1 thread (no multithreaded calculations invoked).
Before main variant filters, 138 founders and 0 nonfounders present.
Calculating allele frequencies... done.
Total genotyping rate is 0.0729077.
342592 variants and 138 cattle pass filters and QC.
Note: No phenotypes present.
--make-bed to mymerged_144.bed + mymerged_144.bim + mymerged_144.fam ... done.

After using 3 Step, I am able to get merge files (bim, bed, fam). But I am not sure, If it is correct or not?

Before merging the Plink files, Total genotyping rate for each sample was 0.97. Here, after merging, Total genotyping rate is 0.0729077. Could you please explain, what might be the reason? Should I use the output for further steps?

Thanks a lot in advance

vcf Plink1.9 GATK bcftools • 454 views
ADD COMMENT

Login before adding your answer.

Traffic: 2263 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6