Entering edit mode
29 days ago
Begonia_pavonina
▴
210
I have formatting issues when preparing my data for STRUCTURE analysis. I am creating the input files for STRUCTURE, but unfortunately the .strct_in file produced seems to show the samples names, that trigger an error in STRUCTURE. Did anyone had this problem before?
# Base names for outputs
PLINK_BASE="all.work10.pruned"
STRUCTURE_BASE="all_for_structure"
# Step 1: Convert to PLINK format
plink --vcf "${PLINK_BASE}.vcf.gz" \
--make-bed --double-id --allow-extra-chr \
--out "$PLINK_BASE"
# Step 2: Convert to STRUCTURE format
plink --bfile "$PLINK_BASE" \
--recode structure --allow-extra-chr \
--out "$STRUCTURE_BASE"
Provide the error message.
Can you provide the error message and what plink version you are using?
For what it's worth, I've encountered many challenges in getting plink to convert to a format STRUCTURE will accept ... I end up using PGDSpider2 sometimes, especially if I just have a single vcf to convert.