Question

Mapping (minimap2) nanopore reads to parent species to confirm hybridization

0

Entering edit mode

4.1 years ago

ricardoguerreiro2121 ▴ 80

Hey,

I have created a scientifically interesting hybrid of two related plant species, but I am not sure if it really is hybrid, so I got low coverage nanopore sequencing for them.

I have relatively good assemblies for both parents. The idea is now to map the reads of the new hybrid to each of the parents, to confirm hybridization.

So far so good, but now if gets confusing: Because the species are related, parts of their genome are similar. Naturally, many Nanopore reads map to both. Furthermore, many reads map to multiple places, complicating the picture.

Total nanopore reads:      516914        
Reads mapping to parent1:  632245    | not chimeric: 519048
Reads mapping to parent2:  1122104   | not chimeric: 594088

I have done mapping with minimap2 (with "-x map-ont"), which is very fast, but I'm unsure of how lax it is mapping.

Do you have suggestions on how to continue with the analysis? Strict parameters for minimap? Use other aligner instead? Filter the mapped reads for single hits? An altogether different approach?

When do you think I can conclude that my hybrid really is a hybrid?

Thanks, Ricardo

hybrids long reads parents minimap2 • 2.6k views

ADD COMMENT • link 4.0 years ago by ricardoguerreiro2121 ▴ 80

score 1 · Answer 1 · 2020-05-06

How "related" are the two parent species? Do you expect to have F1 hybrid progeny (~50% parent1: ~50% parent2) or backcrosses? I would recommend simultaneous "mapping" (although the tool I suggest using, seal.sh from BBTools/BBMap, is really not aligning - rather classifying by k-mers). Because these are error-prone Nanopore reads (uncorrected, I assume), you might need to play around with the k-mer values (1-31) for seal.sh.

I have done something like this for Illumina reads on two closely-related mammals, and I pretty much aligned the dustmasked (dustmasker from NCBI) genome assemblies for the parents with minimap2 using the asm# presets (depends on how similar they are). Then I extracted the aligned regions from PAF file associated with the parents to make a BED file. Then I used bedtools getfasta to get the sequences for the regions for each parent separately. Note this might not be necessary, but I found this decreased the deviations from expected F1's based on simulations of the two closely-related mammals.

After you have whatever you want to use as parent1's reference and parent2's reference, then you can use seal.sh, which has a lot of options for fine tuning such as mkf= (minimum fraction of k-mers of a sequencing read needed to assign [not map] the read to parent 1 or parent 2) k= (k-mer length 1-31), and whether reads should be discarded if assigned to multiple places in parent1 and/or discarded if assigned to both parent1 and parent2 ambiguously. seal.sh has a refstats= option that allows you to see how many reads or bases from reads are assigned to which parent either ambiguously or unambiguously. Perhaps the unambiguously assigned reads or bases is more informative in your case. Simulating F1 hybrids with randomreads.sh also from BBTools (or using a Nanopore read simulator) might help in choosing seal.sh settings with the highest accuracy.