Very Large Insertion Detection Methodology Query
3
1
Entering edit mode
7.5 years ago
rob234king ▴ 600

I've been given a project to find an plasmid insertion of a small genome that has been sequenced using Illumina that has a reference sequence available. The plasmids used are large 10kb.

I'm used to using Novoalign3 mapping and GATK to find SNPs and INDELs but I'm not sure that such a large INDEL will be detected using this method, I'll attempt but I was wondering if there is a more appropiate method to do this, I was thinking possibly de novo assembly and compare with reference using mummer. Any thoughts on the best method to detect plasmid insertions that are going to be 1-10kb?

gatk • 3.7k views
2
Entering edit mode
7.5 years ago

Use bwa-mem to find the regions where the reads that have both a part mapping the plasmid and a part mapping the genome.

0
Entering edit mode

Thanks for the quick response, not sure I understand this though. Do you mean map the reads to the plasmid sequence rather than the reference and then take the overhang sequences either side of the plasmid from those reads that mapped, join them and search for it in the reference?

0
Entering edit mode

I meant, use both sequences (plasmid+genome) for your bwa reference. and bwa will tell you where some reads overlap a junction.

0
Entering edit mode

Ah - I replied assuming that the sequence of the plasmid was unknown Pierre. I'm sure Rob234 can clarify

0
Entering edit mode

Thanks, still good to know that could be done without though. Yea I know what the sequence should be but the plasmid is placed in the sequence randomly. I'm not sure what is meant by overlap a junction? if I add the plasmid to the reference genome it's like separate contigs the mapper doesn't try to map across the two contigs? and the plasmid isn't attached to either of the ends it's inserted in it. Most likely don't understand how BWA-MEM is working, does it report reads using a special flag that can be split and mapped to both contigs (junctions)?

1
Entering edit mode

If you are using paired end reads, then the mapper will tell you if one end maps to one chromosome and the other maps to the plasmid

1
Entering edit mode

BWA will tell you in a SAM if ONE read maps two regions: the best hit is in the regular record (say chr1:12345 cigar:50M50S) and the 2nd hit in the metadata (plasmid:6789 cigar:50S50M )

1
Entering edit mode
7.5 years ago

Have a go with Cortex:

I've used it to to look at plasmids before. Use run_calls (described in the manual) to automatically assemble and error-clean, and then you can 1. try the "Bubble Caller". 2. If that fails, dump contigs using --output_supernodes, and see if any of them are plasmids. Once you identify a plasmid contig, add that to your reference, and then remap your reads.

0
Entering edit mode
7.5 years ago
Rohit ★ 1.4k

Why don't you try out Segemehl http://www.bioinf.uni-leipzig.de/Software/segemehl/

I don't mean to add another one to the long list of available software for read mapping. But with my personal experience, Segemehl has worked quite well for detecting the splicing with high selectivity. Sensitivity of the tool is good. But I've no data regarding the largest gap it can recognize while mapping to a reference.