I am working on a cell line authentication project to identify different aspects (verify organism, the genome does not differ too much from reference, etc.) of a cell line that is used.
As a test I am using samples of Staphylococcus aureus, Illumina reads 150bp.
- For QC I used Trimmomatic for adapter removal and quality trimming.
- BWA MEM for mapping.
- Samtools sort on chromosomal coordinates for sorted bam file.
- GATK (22.214.171.124) for Marking duplicates, which is using Picard.
- No base recalibration since my quality is fairly high (>34) and there are no known SNP sites for this organism (or at least what I could find (e.g. dbSNP))
Currently, I am looking into GATK (126.96.36.199) to use for variant calling as well. To my understanding, GATK Haplotypecaller is for germline variant calling (so single chromosomes) so this would be best applicable for S. aureus since it also has one chromosome? Or is this assumption totally incorrect and should I use Mutect2 (somatic cells) instead?
Or is GATK not the best tool for this kind of microbial data?
I am fairly new to variant calling so any answers or advice are helpful! :)