Question: How to do a local denovo assembly including unmapped paired reads for many samples to genotype a large insertion.
19 months ago by
William4.4k wrote:

I have a set of many bam files for which I would like to know if an insertion of a few 100bp is present at a certain locus.

The insertion variant is not picked up by small variant callers like freebayes, gatk or structural variant callers like lumpy or manta when doing a full genome variant calling.

It should be possible to use the aligned reads and their unmapped mates to do a full local assembly of the region that includes the insertion.

What tool or pipeline can I best use for this?

I guess I need to use both the bam files and the original fastq files, since I need the unclipped, unsplit reads for the local assembly?

ADD COMMENTlink modified 19 months ago by h.mon24k • written 19 months ago by William4.4k

Take a look at ABRA/ABRA2.

ADD REPLYlink written 19 months ago by genomax64k
19 months ago by
United States
harold.smith.tarheel4.3k wrote:

If you know the exact sequence and location of the insertion, and you just need to confirm presence/absence, what not 'grep' the FASTQs for the genome/insert junction sequences? Or am I missing something?

ADD COMMENTlink written 19 months ago by harold.smith.tarheel4.3k
19 months ago by
h.mon24k wrote:

If all you are interested is one position at one locus, viewing the BAMs on IGV or other genome browser should settle the issue. If your reference genome does not contain the insertion, look at reads being soft-clipped at the expected position of the insert.

What kind of sequencing did you perform (RNAseq, DNAseq, etc) ? What mapper did you use? Subread claims to be able to identify short indels (up to 200bp) at the alignment step - use -I with subread-align.

ADD COMMENTlink modified 19 months ago • written 19 months ago by h.mon24k
