Question: Count SNPs per individual FAST!
0
gravatar for QVINTVS_FABIVS_MAXIMVS
2.2 years ago by
USA SoCal
QVINTVS_FABIVS_MAXIMVS2.4k wrote:

Say I have a large (N samples > 2000) VCF or plink bed file.

What's the quickest way to calculate the number of alleles (unique alleles and # ALT genotypes) for each sample?

What are the options that can quickly digest a 1Tb VCF (broken by chrom)?

Plink is ridiculously fast for this, but I don't think it can perform a per-sample count of variants

snp plink gwas • 1.3k views
ADD COMMENTlink modified 2.2 years ago by chrchang5237.1k • written 2.2 years ago by QVINTVS_FABIVS_MAXIMVS2.4k

in the title you want to count the number of snp per individual, in the body you want the number of allele for each sample. Please, show us the expected output.

ADD REPLYlink written 2.2 years ago by Pierre Lindenbaum129k

Either

IID    #UNIQUE_ALLELES

Or

IID   #UNIQUE_ALLELES    #ALT_ALLELES

so the first file would output ID1 1 for 1/1 genotype but the second file you have ID1 1 2

ADD REPLYlink modified 2.2 years ago • written 2.2 years ago by QVINTVS_FABIVS_MAXIMVS2.4k
1
gravatar for chrchang523
2.2 years ago by
chrchang5237.1k
United States
chrchang5237.1k wrote:

plink --score can be abused for this purpose.

  • Create an input file assigning weight 1 to every alt allele to get #ALT_ALLELES.
  • You can then repeat the --score computation after erasing all the heterozygous calls ("plink --set-hh-missing --chr-set -26 --make-bed"; it may be necessary to use "--output-chr 26 --make-bed" first to force numeric chromosome codes). You should be able to infer #UNIQUE_ALLELES once you have both --score output files.
ADD COMMENTlink modified 2.2 years ago • written 2.2 years ago by chrchang5237.1k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1544 users visited in the last hour