FastStructure allele frequency output did not make sense
0
0
Entering edit mode
6 weeks ago

Hello there,

I am trying to run FastStructure for 48 lcWGS samples of the broad-toothed rat. Joint genotyping was done with the Illumina DRAGEN pipeline. The output was a vcf.gz file.

Using vcftools, I filtered out monomorphic regions and low quality calls.

vcftools --gzvcf <file> --remove-indels --maf 0.001 --max-missing 0.9 --minQ 30 --minDP 5 --recode --stdout | gzip -c > <file>

Using the vcf output from vcftools, I created a bed file with Plink.

plink --vcf <file> --allow-extra-chr --make-bed --out <file>

After that I ran FastStructure structure.py for K=1 to 10.

python structure.py -K <int> --input=<file> --output=<file>

The allele frequency output (meanP) looked really weird. Except for 2 columns, the other columns are 0.5s. For example, attached below is the allele frequency output (first 20 rows) for K=4. meanP

The meanQ output showed that all samples belonged to the same cluster. Which did not make sense to me. (PCA has shown really clear structure)

This is the meanQ output (first 20 rows) for K=4.

meanQ

Could anyone provide insight into what could've gone wrong? I couldn't figure out what caused the 0.5s in the allele frequency output.

Thank you very much in advance!

plink frequency faststructure allele • 150 views
ADD COMMENT

Login before adding your answer.

Traffic: 1053 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6