Extracting Phylogenetic Outgroup Sequence from Reference Genome with BWA/SAMtools
0
0
Entering edit mode
2 days ago

Hello,

I have sequence alignments from a RADseq data set, for which I've output the entire sequences (i.e., not just the SNPs) in fasta format to use in a phylogenetic analysis. Including an outgroup species in the RAD data set would have been problematic, but I do have a high quality reference genome for my outgroup species. My plan is to align the RAD loci to this genome and then, if the alignment is good enough (length, MAPQ score...), extract that sequence from the genome to serve as the outgroup in the phylogenetic analysis.

I've done this and it works fine, but the problem is it's slow. I output consensus sequences for each locus from Geneious, align to the reference genome using BWA mem, import the .sam files into Geneious. That's fast enough, but then I have to manually go into the alignment and extract just the portion of the reference genome that aligned, then re-align it to the RAD locus alignment. This is doable for a few dozen loci but can't be done at scale.

Is there any way to automatically extract the portion of the reference contig that aligned to my RAD sequence, while discarding the rest? The process would be fairly slow even with that, but it would be a big improvement. Alternatively, if anyone is aware of some way to make this fast that I haven't thought of, I'm all ears. Thanks!

Alignment • 169 views
ADD COMMENT

Login before adding your answer.

Traffic: 1263 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6