Problem with the total number of variants in my .map .ped files from PLINK
0
0
Entering edit mode
2.1 years ago

Hello Biostars community!

I am currently working on a case/control study in which we ran NGS sequencing for a ~198 SNPs panel. I started using VCFtools ( trough -tabix, parallel and vcfmerge) for vcf indexing and merging. After that I used VCFTOOLS to generate my PLINK file ( .ped and .map) to further running association statistics on PLINK 1.9 ( linux line command ). When I looked to the number of variants as being 356 I got suprised, because my SNP panel was designed to cover only 198 target SNPs instead. With that in mind, I also got a low "Total genotyping rate" of 0.520599 after running a --missing analysis on my PLINK file, which I believe is due to that overcounting of variants. Would anyone be able to give me some advice on solving that problem? This how my script looks like:

(VCF_MERGE) elielson@elielson-VirtualBox:~/bioinfo/arbovirose_all_vcf$ vcftools --vcf merged_vcfs.vcf --plink --out myplink


VCFtools - 0.1.16
(C) Adam Auton and Anthony Marcketta 2009

Parameters as interpreted:
    --vcf merged_vcfs.vcf
    --out myplink
    --plink


Warning: Expected at least 2 parts in INFO entry: ID=TYPE,Number=A,Type=String,Description="The type of allele, either snp, mnp, ins, del, or complex.">
Warning: Expected at least 2 parts in INFO entry: ID=TYPE,Number=A,Type=String,Description="The type of allele, either snp, mnp, ins, del, or complex.">
Warning: Expected at least 2 parts in INFO entry: ID=TYPE,Number=A,Type=String,Description="The type of allele, either snp, mnp, ins, del, or complex.">
Warning: Expected at least 2 parts in INFO entry: ID=TYPE,Number=A,Type=String,Description="The type of allele, either snp, mnp, ins, del, or complex.">
Warning: Expected at least 2 parts in INFO entry: ID=TYPE,Number=A,Type=String,Description="The type of allele, either snp, mnp, ins, del, or complex.">
Warning: Expected at least 2 parts in FORMAT entry: ID=GQ,Number=1,Type=Integer,Description="Genotype Quality, the Phred-scaled marginal (or unconditional) probability of the called genotype">

After filtering, kept 6 out of 6 Individuals

Writing PLINK PED and MAP files ... 

PLINK: Only outputting biallelic loci.

Done.

After filtering, kept 367 out of a possible 367 Sites

Run Time = 0.00 seconds

(base) elielson@elielson-VirtualBox:~/bioinfo/arbovirose_all_vcf$ plink --file myplink --missing --allow-no-sex

PLINK v1.90b6.21 64-bit (19 Oct 2020)          www.cog-genomics.org/plink/1.9/
(C) 2005-2020 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to plink.log.
Options in effect:
  --allow-no-sex
  --file myplink
  --missing


2910 MB RAM detected; reserving 1455 MB for main workspace.

Possibly irregular .ped line.  Restarting scan, assuming multichar alleles.

.ped scan complete (for binary autoconversion).

Performing single-pass .bed write (356 variants, 6 people).

--file: plink-temporary.bed + plink-temporary.bim + plink-temporary.fam
written.

356 variants loaded from .bim file.

6 people (0 males, 0 females, 6 ambiguous) loaded from .fam.

Ambiguous sex IDs written to plink.nosex .

6 phenotype values loaded from .fam.

Using 1 thread.

Before main variant filters, 6 founders and 0 nonfounders present.

Calculating allele frequencies... done.

Total genotyping rate is 0.520599.
PLINK map ped • 659 views
ADD COMMENT
1
Entering edit mode

Map and ped files are extremely old and inefficient. plink2 and their associated pgen files are much better to use now.

Similarly, vcftools is deprecated and shouldn’t be used. Use bcftools instead.

Please also format your code with the format button ‘the one with 1s and 0s. Your post is hard to read at the moment.

ADD REPLY

Login before adding your answer.

Traffic: 2291 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6