Hey,
I have multiple whole genome sequences, but want SNP data to perform some GWAS, and to calculate a GRM.
I used minimap2 to align all genomes against the reference, then used samtools to binarize, sort, and index the resulting .sam files.
Then I used bcftools mpileup
and bcftools call
to get .vcf files, one for each of the genomes (except the reference). Then I use bcftools merge
to get a single .vcf, and plink --recode --vcf merged.vcf --out merged
and plink --file merged --make-bed --out merged
to get the corresponding PLINK files. However, when I want to e.g. filter for minor allele frequency with PLINK, it says Error: All variants removed due to minor allele threshold(s)
. When I use GCTA directly to build a GRM, it says 1356568 SNPs have been processed. Used 0 valid SNPs.
.
When converting the .ped file to a csv with some cat command from the internet, the table contains 0
, G
, C
, T
, A
, and there are SNPs with 2, 1, but also no 0
entries.
I am very new to this field. Where in this pipeline could be the error? How could I check what is wrong with my data?
Any help is much appreciated.