Question: Very Large Insertion Detection Methodology Query
1
gravatar for rob234king
5.4 years ago by
rob234king570
UK/Harpenden/Rothamsted Research
rob234king570 wrote:

I've been given a project to find an plasmid insertion of a small genome that has been sequenced using Illumina that has a reference sequence available. The plasmids used are large 10kb.

I'm used to using Novoalign3 mapping and GATK to find SNPs and INDELs but I'm not sure that such a large INDEL will be detected using this method, I'll attempt but I was wondering if there is a more appropiate method to do this, I was thinking possibly de novo assembly and compare with reference using mummer. Any thoughts on the best method to detect plasmid insertions that are going to be 1-10kb?

gatk • 3.1k views
ADD COMMENTlink modified 5.4 years ago by Rohit1.3k • written 5.4 years ago by rob234king570
2
gravatar for Pierre Lindenbaum
5.4 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum117k wrote:

Use bwa-mem to find the regions where the reads that have both a part mapping the plasmid and a part mapping the genome.

ADD COMMENTlink written 5.4 years ago by Pierre Lindenbaum117k

Thanks for the quick response, not sure I understand this though. Do you mean map the reads to the plasmid sequence rather than the reference and then take the overhang sequences either side of the plasmid from those reads that mapped, join them and search for it in the reference?

ADD REPLYlink written 5.4 years ago by rob234king570

I meant, use both sequences (plasmid+genome) for your bwa refrence. and bwa will tell you where some reads overlap a junction.

ADD REPLYlink written 5.4 years ago by Pierre Lindenbaum117k

Ah - I replied assuming that the sequence of the plasmid was unknown Pierre. I'm sure Rob234 can clarify

ADD REPLYlink written 5.4 years ago by zam.iqbal.genome1.7k

Thanks, still good to know that could be done without though. Yea I know what the sequence should be but the plasmid is placed in the sequence randomly. I'm not sure what is meant by overlap a junction? if I add the plasmid to the reference genome it's like separate contigs the mapper doesn't try to map across the two contigs? and the plasmid isn't attached to either of the ends it's inserted in it. Most likely don't understand how BWA-MEM is working, does it report reads using a special flag that can be split and mapped to both contigs (junctions)?

ADD REPLYlink written 5.4 years ago by rob234king570
1

If you are using paired end reads, then the mapper will tell you if one end maps to one chromosome and the other maps to the plasmid

ADD REPLYlink written 5.4 years ago by zam.iqbal.genome1.7k
1

BWA will tell you in a SAM if ONE read maps two regions: the best hit is in the regular record (say chr1:12345 cigar:50M50S) and the 2nd hit in the metadata (plasmid:6789 cigar:50S50M )

ADD REPLYlink modified 5.4 years ago • written 5.4 years ago by Pierre Lindenbaum117k
1
gravatar for zam.iqbal.genome
5.4 years ago by
United Kingdom
zam.iqbal.genome1.7k wrote:

Have a go with Cortex:

webpage http://cortexassembler.sourceforge.net/index_cortex_var.html docs: http://cortexassembler.sourceforge.net/cortex_var_user_manual.pdf Papers: - on microbes: http://bioinformatics.oxfordjournals.org/content/29/2/275.full.pdf+html - the original paper: http://www.nature.com/ng/journal/v44/n2/full/ng.1028.html

I've used it to to look at plasmids before. Use run_calls (described in the manual) to automatically assemble and error-clean, and then you can 1. try the "Bubble Caller". 2. If that fails, dump contigs using --output_supernodes, and see if any of them are plasmids. Once you identify a plasmid contig, add that to your reference, and then remap your reads.

ADD COMMENTlink written 5.4 years ago by zam.iqbal.genome1.7k
0
gravatar for Rohit
5.4 years ago by
Rohit1.3k
California
Rohit1.3k wrote:

Why don't you try out Segemehl http://www.bioinf.uni-leipzig.de/Software/segemehl/

I don't mean to add another one to the long list of available software for read mapping. But with my personal experience, Segemehl has worked quite well for detecting the splicing with high selectivity. Sensitivity of the tool is good. But I've no data regarding the largest gap it can recognize while mapping to a reference.

ADD COMMENTlink written 5.4 years ago by Rohit1.3k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1229 users visited in the last hour