Question: Trio: Detecting A Large De Novo Indel
1
gravatar for Pierre Lindenbaum
6.8 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum118k wrote:

I've been given a set of three BAM (father, mother, child) and I expect the child to contains some de-novo heterozygous variations . samtools mpileup have been used to find the small variations but waht would be your protocol to extract the larger de-novo indels from the child ?

indel bam • 2.1k views
ADD COMMENTlink written 6.8 years ago by Pierre Lindenbaum118k
1

How large are the indels you're looking for? If they're bigger than read size, you'll need a structural variation detector: Hydra, pindel, Genome STRiP

ADD REPLYlink modified 6.8 years ago • written 6.8 years ago by Brad Chapman9.3k
1

Or an assembler. Cortex, SGA, Fermi.

ADD REPLYlink written 6.8 years ago by zam.iqbal.genome1.7k
3
gravatar for Alex Paciorkowski
6.8 years ago by
Rochester, NY USA
Alex Paciorkowski3.3k wrote:

I would 1) combine those bam files into a bamlist, and 2) walk the bamlist with GATK, this produces a combined vcf file with alleles called in proband and parents. Then 3) annotate that vcf with seattleseq, using the vcf (indels only) option -> and you should get an annotated vcf that you can then parse out, pulling out indels present only in the proband and not in the parents. If you have SNP array data on the family, that can be used as validation.

ADD COMMENTlink modified 6.8 years ago • written 6.8 years ago by Alex Paciorkowski3.3k

which GATK walker would you use ? Will GATK be able to retrieve some large indels that samtools cannot detect ?

ADD REPLYlink written 6.8 years ago by Pierre Lindenbaum118k

What covg do you have for each sample?

ADD REPLYlink written 6.8 years ago by zam.iqbal.genome1.7k

~30 reads .

ADD REPLYlink written 6.8 years ago by Pierre Lindenbaum118k
2
gravatar for zam.iqbal.genome
6.8 years ago by
United Kingdom
zam.iqbal.genome1.7k wrote:

Oops, sorry Pierre, I did not spot your reply. I assume that means 30x depth overall, so 15 per chromosome/allele/haplotype. If you want larger indels, then I think assembly is the only reliable way to call them (you will have to adjust for bias here, as I write/maintain a variant assembler). You should have plenty of coverage to do it in this situation. There are a few options

  1. my one - Cortex - dump the reads as fastq, and then pass it to Cortex. You will get better results I think if you do the whole genome, as you avoid errors in the mapper, but that will involve ~100Gb of RAM. You can jojntly assemble the entire trio and directly compare their genomes.

  2. Jared Simpson's SGA assembler also calls variants

  3. Heng Li's fermi assembler can also call variants - recently published in Bioinformatics I think.

I only have experience of option 1 I'm afraid, but I'm sure I, Jared and Heng would be happy to field further questions about details.

ADD COMMENTlink modified 6.8 years ago • written 6.8 years ago by zam.iqbal.genome1.7k
1

Is Cortex suitable for exome data?

ADD REPLYlink written 6.4 years ago by michealsmith740
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1087 users visited in the last hour