FastStructure allele frequency output did not make sense
Entering edit mode
6 weeks ago

Hello there,

I am trying to run FastStructure for 48 lcWGS samples of the broad-toothed rat. Joint genotyping was done with the Illumina DRAGEN pipeline. The output was a vcf.gz file.

Using vcftools, I filtered out monomorphic regions and low quality calls.

vcftools --gzvcf <file> --remove-indels --maf 0.001 --max-missing 0.9 --minQ 30 --minDP 5 --recode --stdout | gzip -c > <file>

Using the vcf output from vcftools, I created a bed file with Plink.

plink --vcf <file> --allow-extra-chr --make-bed --out <file>

After that I ran FastStructure for K=1 to 10.

python -K <int> --input=<file> --output=<file>

The allele frequency output (meanP) looked really weird. Except for 2 columns, the other columns are 0.5s. For example, attached below is the allele frequency output (first 20 rows) for K=4. meanP

The meanQ output showed that all samples belonged to the same cluster. Which did not make sense to me. (PCA has shown really clear structure)

This is the meanQ output (first 20 rows) for K=4.


Could anyone provide insight into what could've gone wrong? I couldn't figure out what caused the 0.5s in the allele frequency output.

Thank you very much in advance!

plink frequency faststructure allele • 150 views

Login before adding your answer.

Traffic: 1053 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6