SNP calling from multiple genomes: No valid SNPs
1
0
Entering edit mode
20 days ago

Hey,

I have multiple whole genome sequences, but want SNP data to perform some GWAS, and to calculate a GRM. I used minimap2 to align all genomes against the reference, then used samtools to binarize, sort, and index the resulting .sam files. Then I used bcftools mpileup and bcftools call to get .vcf files, one for each of the genomes (except the reference). Then I use bcftools merge to get a single .vcf, and plink --recode --vcf merged.vcf --out merged and plink --file merged --make-bed --out merged to get the corresponding PLINK files. However, when I want to e.g. filter for minor allele frequency with PLINK, it says Error: All variants removed due to minor allele threshold(s). When I use GCTA directly to build a GRM, it says 1356568 SNPs have been processed. Used 0 valid SNPs..

When converting the .ped file to a csv with some cat command from the internet, the table contains 0, G, C, T, A, and there are SNPs with 2, 1, but also no 0 entries.

I am very new to this field. Where in this pipeline could be the error? How could I check what is wrong with my data?

Any help is much appreciated.

samtools bcftools snp plink wgs • 413 views
ADD COMMENT
1
Entering edit mode
17 days ago

The typical SNP calling route is to run say 3 short read datasets (fastq input) against one reference genome. Then your approach would likely work.

Only now are we starting to see SNP calling from genome to genome (both fasta, i.e not fastq) comparisons, which is not as well catered for in terms of toolsets. I don't know of any toolchains which can do what you want starting from genome to genome alignments.

Maybe you can fake a higher depth in your vcf/bcf (since depth 1 which is max in a genome to genome comparison is likely to be seen as a false positive if tools are set to expect 30X coverage WGS).

ADD COMMENT

Login before adding your answer.

Traffic: 3686 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6