10 weeks ago
poecile.pal ▴ 50

Good morning,

I have whole-genome sequencing data for several samples of one plant species. I performed mapping of all reads on the merged nuclear+plastid+mitochondrial reference genome of this species (bwa-mem2), extract part of the alignment, that was refered to plastome, did other intermediate actions (samtools), called variants (freebayes), filtered good SNPs (vcflib), converted to pseudoalignment (vcf2phylip), built NJ-tree (mega).

Now I would like to add outgroup (another species from the same family). I see 3 ways:

  1. To simulate a random sequence of nucleotides and attach to a pseudoalignment. I have performed it and have error distance matrix could not be computed. Error Id is 4508 (Jukes-Cantor distance incalculable) (mega), because ot the strong dissimilarity, possibly. And it seems to me that I have never seen such man-made outgroups in articles. This way is most likely wrong.

  2. To simulate reads from reference plastome of the outgroup species (wgsim) and repeat all, including mapping on the merged nuclear+plastid+mitochondrial genome of my initial species. I have performed it, but obtain so many SNPs, that it seems implausible (I think that 2 species from 1 family are more similar). And I have the same error as in 1 way.

  3. To perform alignment of reference plastomes of initial species and outgroup species (mega: clustalw or muscle) and convert alignment to vcf (found jvarkit: MsaToVcf in this post). I have performed alignment, but haven't tried jvarkit, because the alignment confuses me, it has a lot of extended indel segments.

So, could you please advise me how to perform such task? What of these ways is valid or is there any other true way?

Best regards, Poecile

