I have the genome sequence of a nematode species I'm working with. This genome was assembled using reads (3 pair-end libraries and 2 mate-pair) from one particular strain (let's call it Strain A).
Now I have one pair-end library for another strain (Strain B).
I would like to call SNPs and InDels within and between strains. I am unsure about how to do this.
I thought about using one of the following pipelines (or a similar one): https://approachedinthelimit.wordpress.com/2015/10/09/variant-calling-with-gatk/ or https://github.com/metalhelix/pipette
Basically, I would align my reads against the genome, mark the duplicates, run the UnifiedGenotyper of GATK and filter the variants.
Now some questions and concerns: - I have a lots more read files for Strain A then Strain B (only one library). Should I only use one library of Strain A? Does the insert size matter? - The pipeline I described above will enable me to get variants within each strain but not between strains. Do you know any way I could do this analysis?
Many thanks for your suggestions and insights! Sophie