Genome mapping mm39 via GTK (BaseRecalibration) - Where to get known sites of polymorphisms?
0
0
Entering edit mode
15 months ago
Rashid • 0

Hello, I'm trying to run the GATK Base recalibration function to eventually map the mouse mm39 genome. I created reference and index files already based on my .fa genome and require these arguments to run the function:

--known-sites / NA One or more databases of known polymorphic sites used to exclude regions around known polymorphisms from analysis. This algorithm treats every reference mismatch as an indication of error. However, real genetic variation is expected to mismatch the reference, so it is critical that a database of known polymorphic sites is given to the tool in order to skip over those sites. This tool accepts any number of Feature-containing files (VCF, BCF, BED, etc.) for use as this database. For users wishing to exclude an interval list of known variation simply use -XL my.interval.list to skip over processing those sites. Please note however that the statistics reported by the tool will not accurately be reflected those sites skipped by the -XL argument.

Where can I get these database files needed in the proper format for mm39 specifically (mouse genome), I found this site: https://www.mousegenomes.org/snps-indels/ which leads to https://ftp.ebi.ac.uk/pub/databases/mousegenomes/REL-1505-SNPs_Indels/

Although I am not sure if these are in the correct format. Ultimately I will go to a mapped BAM file and then to a VCF after this.

Sorry if this is a basic question, I am a software engineer working in a biology context so I am not familiar a lot and have to learn as I go, my position does not give me time to sit down and read a book and learn everything properly.

Thanks!

mm39 • 388 views
ADD COMMENT

Login before adding your answer.

Traffic: 2662 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6