I have whole genome sequencing pair-end reads which I would like to match to a previously made catalog generated by ddRADseq pair-end reads.
What I have tried so far are:
STACKS I extracted the catalog consensus sequences and used locus ID as the name of each sequence. I converted this file into a fasta, indexed it, and mapped WGS reads using bwa mem to this consensus reference. I processed these BAMs against a previous made catalog (from ddRAD PE reads). I tried to merge VCFs produced from ddRAD and whole genome sequencing, but I realised they had mismatch chromosome IDs and positions. I should mention that my reference genome a size of 3.2 GB (marsupial species) and renaming the consensus reference sequences won't work either.
GATK I ran 10 samples, 4 from ddRAD and 6 from WGS, in GATK HaplotypeCaller. It took 4 days and was still running when I had to stop it. I have 556 samples in total (550 samples from ddRAD and 6 from WGS). This approach would take a very long time to complete.
Any suggestions would be greatly appreciated.