Question: Filtering both concordantly and discordantly mapped reads
gravatar for valerie
2.4 years ago by
valerie60 wrote:

Hi guys,

Is it possible to filter both concordantly and discordantly mapped reads?

I believe that there can be huge deletions/insertions in my reference sequence and the distance between reads can differ from 100 nucleotides that are expected by aligner (I use bowtie). I think this is the best way to find paired reads that mapped to the reference to study insertions and deletions.

Thank you!

sequencing samtools ngs • 928 views
ADD COMMENTlink modified 2.4 years ago by d-cameron2.0k • written 2.4 years ago by valerie60
gravatar for Brian Bushnell
2.4 years ago by
Walnut Creek, USA
Brian Bushnell16k wrote:

BBMap is very good at correctly mapping reads with long indels. For example - in=r1.fq in2=r2.fq ref=ref.fa maxindel=400k out=mapped.sam

That takes care of deletions up to around 400kbp. Insertions are trickier. With 150bp reads, BBMap won't reliably map reads spanning insertions longer than ~50bp (obviously, insertions >150bp are fundamentally impossible to capture in a 150bp read). But it can spit out discordant pairs: in=r1.fq in2=r2.fq ref=ref.fa maxindel=400k  outm=mapped.sam outu=unmapped.fq po

This will send all unmapped reads to "outu" (unmapped.fq) as an interleaved fastq. Also, the "po" flag means all discordant pairs will be marked as unmapped, meaning all half-mapped pairs and discordant pairs will go to that file, which can then be re-mapped with different parameters. You may want to set the "pairlen" flag also to tell it the maximum distance allowed for concordant pairs.

ADD COMMENTlink modified 2.4 years ago • written 2.4 years ago by Brian Bushnell16k

Thank you, Brian! This is exactly what I was looking for!

ADD REPLYlink written 2.4 years ago by valerie60
gravatar for d-cameron
2.4 years ago by
d-cameron2.0k wrote:

There are over 50 publicly available structural variant callers, with about 1/4 able to detect small insertions and large deletions using the technique use describe. These include BreakDancer, HYDRA, GASVPro, and CLEVER. Other detection signals that can be used to detect structural variants include read alignment CIGAR, split reads, soft clipped reads, read pairs with only one read mapped, and multiple different assembly-based approaches. Tools using such methods (most of which combine one or more of these signals with read pair information) include Pindel, SRiC, CREST, SVseq2, PRISM, SoftSearch, Socrates, LUMPY, cortex-var, lasSV, AsmVar, SoftSV, manta, and GRIDSS.

I would strongly advise looking at the existing tools and literature before reinventing a wheel.

Disclaimer 1: I didn't check that every one of these tools reports small insertions. Some tools, such as DELLY, use read pair and split read information but do not report insertion events.

Disclaimer 2: As I wrote GRIDSS, it should come as no surprise that I recommend it. My extensive benchmarking shows it outperforms other callers (breakdancer, cortex, delly, hydra, lumpy, pindel, software, tigra) at 15x or greater coverage. That said, if you're interested in novel insertions larger than the library fragment size, you'll need to use a de novo assembly based technique (eg cortex) or a specialised caller such as NovelSeq.

ADD COMMENTlink written 2.4 years ago by d-cameron2.0k

Thank you for your advice!

ADD REPLYlink written 2.4 years ago by valerie60
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 881 users visited in the last hour