Hi,
First off, that pattern in your Manhattan plot --gaps around the centromeres-- is pretty common and usually not a red flag for your analysis. Technically, it's reasonable because centromeric regions are notorious for being tough nuts to crack in sequencing: they're highly repetitive, have weird GC content, and reads map poorly there (or not at all). Your hard filters (especially MQ < 60 and the rank sum tests) are spot-on for GATK best practices, but they naturally cull SNPs in those areas since mapping quality tanks and strand bias creeps in from the repeats. Biologically, it makes sense too—recombination is suppressed near centromeres, so polymorphism is often lower anyway, especially if your species has that heterochromatin setup (assuming this isn't human; if it is, same deal).
Your summary stats look solid overall: median MAF at 0.09 is decent for discovery, missingness under 0.25 is fine, and those QD/FS values aren't screaming problems. The mean FS at 57.63 jumps out a bit (even with your >5 filter), but if it's driven by a few outliers post-filter, it might just be noise from dense regions elsewhere.
To double-check reliability, here's a stepwise plan—focus on diagnostics first, then poke at the data:
QQ plot and genomic control lambda: This is your quick gut-check for model fit. In rMVP, after x<-MVP() , grab the p-values and plot:
qqplot(-log10(1:p), -log10(gwas$p.values), main="QQ Plot")
abline(0,1)
lambda <- median(qchisq(1-gwas$p.values,1), na.rm=T) / qchisq(0.5,1)
Lambda around 1-1.05 is golden; >1.1 might mean unaccounted structure (try adding more PCs if you have them).
SNP density by chromosomal position: Slice your VCF or genofile by position and histogram it per chromosome to visualize those gaps. Load via vcfR (it's stable, no updates needed):
library(vcfR)
vcf <- read.vcfR("your_filtered.vcf.gz")
pos <- getFIX(vcf)$POS
chrom <- getCHROM(vcf)
# For chr1 example
hist(pos[chrom=="1"], breaks=1000, main="SNP Density Chr1")
# Overlay known centromere coords if you have 'em (e.g., from UCSC)
If gaps align with centromeres (grab coords from your genome browser), you're good—it's not your filters eating signal.
Mappability track: Pull a mappability score (e.g., from ENCODE or your aligner logs) and correlate it with your SNP positions. Low mappability (<0.8) should match your gaps. If not, loosen MQ to 50 and re-run filtering to see if centromeric SNPs pop up (but watch for false positives).
Cross-validate the model: Re-run MLM without kinship (if you're using it) or try FarmCPU in rMVP for comparison—MVP.Config(method="FarmCPU"). If peaks shift but centromere gaps stay, it's technical. Also, subsample your SNPs (e.g., LD-prune harder) and re-plot to rule out over-pruning.
External benchmark: Compare your SNP set to a public panel for your species (e.g., dbSNP or a ref GWAS). Overlap and see if your missing centromeric hits are also absent there.
If lambda's off or densities look wonky beyond centromeres, dig into population structure (PCA on your genotypes) or batch effects. Otherwise, this is likely just the data behaving as expected—plenty of published plant GWAS (guessing from rMVP?) show those barren zones.
Kevin
Additional information
I will provide two figures: SNP distribution along chromosomes
Composite figure including: (A) Kinship matrix heatmap (B) QQ plot (C) Manhattan plot

Is your data coming from sequencing or arrays?
It comes from sequencing.
Well, thats why it looks different from what you see in the usual GWAS papers then - GWAS by sequencing is a very recent development. Almost all GWAS until this point has been done by array, and therefore doesn't suffer from a mappability problem.