Hello, I am performing a variant calling pipeline for RNA-seq data of genetically modified mice from a C57Bl/6 background. I am using nextflow optimized pipeline with GATK4 good practices (https://nf-co.re/rnavar/1.0.0/) and this pipeline requires both dbSNP VCF and known indels VCF files for the reference genome (in this case I'm using GRCm38). Which is the most suitable source to obtain this information? Does this information depend on the mouse strain? Thank you so much for your help!
Known SNPs are probably more important for sequencing wild individuals, not pure-bred strains. It's not like you need to filter away known variants as compared to a reference to identify novel ones;they should all be novel.
You could try making dummy empty vcfs, just with headers. Then they won't try to filter away anything. Or a vcf with just the genetic changes you know this background possesses.
what the error log just in case if you are not getting desired output?
Thank you for your reply. It directly does not run the pipeline. I had to deactivate the module of GATK base calibration to run the pipeline