Question: multiple sequence alignment for validation of large indels
gravatar for thomas.welch
3.6 years ago by
thomas.welch20 wrote:

Hi there,

I will soon have 180 illumina paired end sequenced genomes (approx 13x) with which i am conducting a GWAS experiment, these genomes are approximately 36Mbp long and have very long segmental indels (up 770kb). i intend to genotype my samples for these indels as a separate analysis from the GWAS using CNVnator (with a reference genome) to detect and visualise the indels, and i would then like to validate these genotypes for each individual using multiple sequence alignment. Is there a software or package capable of this final step, that can give me an easy to work with output?

Could i perhaps use the BEDtools coverage command for this?

ADD COMMENTlink modified 3.6 years ago • written 3.6 years ago by thomas.welch20
gravatar for Brian Bushnell
3.6 years ago by
Walnut Creek, USA
Brian Bushnell17k wrote:

This is an indirect answer, but when mapping the reads, if you want to capture long deletion events within reads so that they can be called by standard variant callers (rather than CNV/SV programs), I suggest you map with BBMap, like this:

(once, for indexing) ref=ref.fa k=14

(for each sample) in1=read1.fq in2=read2.fq out=mapped.sam k=14 maxindel=770000 minratio=0.4; sh

The actual maximum length of indels that can be detected this way is related to the read length. For insertions, the maximum length is around 50% of the read length. 150 bp reads should be sufficient for a 770kbp deletion. once you map the reads like this, you can visualize the long indels in IGV from the sorted indexed bam file (created by the "" shell script, if samtools is installed). You can subsequently call the events like this: in=mapped.sam ref=ref.fa vcf=vars.vcf ploidy=1

You can increase read length, if needed, with BBMap's Tadpole or BBMerge, like this: in1=r1.fq in2=r2.fq out1=e1.fq out2=e2.fq mode=extend extendleft=50 extendright=50

or, for paired reads with a long insert size: in1=r1.fq in2=r2.fq out=merged.fq outu=unmerged.fq rem extend2=50

These require sufficient coverage, though, because kmers from other reads are used for extension. 13x is barely sufficient, but should work.

ADD COMMENTlink modified 3.6 years ago • written 3.6 years ago by Brian Bushnell17k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1673 users visited in the last hour