Question: Count SNPs per individual FAST!
0
gravatar for QVINTVS_FABIVS_MAXIMVS
17 months ago by
USA SoCal
QVINTVS_FABIVS_MAXIMVS2.3k wrote:

Say I have a large (N samples > 2000) VCF or plink bed file.

What's the quickest way to calculate the number of alleles (unique alleles and # ALT genotypes) for each sample?

What are the options that can quickly digest a 1Tb VCF (broken by chrom)?

Plink is ridiculously fast for this, but I don't think it can perform a per-sample count of variants

snp plink gwas • 874 views
ADD COMMENTlink modified 17 months ago by chrchang5235.6k • written 17 months ago by QVINTVS_FABIVS_MAXIMVS2.3k

in the title you want to count the number of snp per individual, in the body you want the number of allele for each sample. Please, show us the expected output.

ADD REPLYlink written 17 months ago by Pierre Lindenbaum123k

Either

IID    #UNIQUE_ALLELES

Or

IID   #UNIQUE_ALLELES    #ALT_ALLELES

so the first file would output ID1 1 for 1/1 genotype but the second file you have ID1 1 2

ADD REPLYlink modified 17 months ago • written 17 months ago by QVINTVS_FABIVS_MAXIMVS2.3k
1
gravatar for chrchang523
17 months ago by
chrchang5235.6k
United States
chrchang5235.6k wrote:

plink --score can be abused for this purpose.

  • Create an input file assigning weight 1 to every alt allele to get #ALT_ALLELES.
  • You can then repeat the --score computation after erasing all the heterozygous calls ("plink --set-hh-missing --chr-set -26 --make-bed"; it may be necessary to use "--output-chr 26 --make-bed" first to force numeric chromosome codes). You should be able to infer #UNIQUE_ALLELES once you have both --score output files.
ADD COMMENTlink modified 17 months ago • written 17 months ago by chrchang5235.6k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1368 users visited in the last hour