Trio: Detecting A Large De Novo Indel
2
1
Entering edit mode
10.3 years ago

I've been given a set of three BAM (father, mother, child) and I expect the child to contains some de-novo heterozygous variations . samtools mpileup have been used to find the small variations but waht would be your protocol to extract the larger de-novo indels from the child ?

bam indel • 2.7k views
ADD COMMENT
1
Entering edit mode

How large are the indels you're looking for? If they're bigger than read size, you'll need a structural variation detector: Hydra, pindel, Genome STRiP

ADD REPLY
1
Entering edit mode

Or an assembler. Cortex, SGA, Fermi.

ADD REPLY
3
Entering edit mode
10.3 years ago

I would 1) combine those bam files into a bamlist, and 2) walk the bamlist with GATK, this produces a combined vcf file with alleles called in proband and parents. Then 3) annotate that vcf with seattleseq, using the vcf (indels only) option -> and you should get an annotated vcf that you can then parse out, pulling out indels present only in the proband and not in the parents. If you have SNP array data on the family, that can be used as validation.

ADD COMMENT
0
Entering edit mode

which GATK walker would you use ? Will GATK be able to retrieve some large indels that samtools cannot detect ?

ADD REPLY
0
Entering edit mode

What covg do you have for each sample?

ADD REPLY
0
Entering edit mode

~30 reads .

ADD REPLY
2
Entering edit mode
10.3 years ago

Oops, sorry Pierre, I did not spot your reply. I assume that means 30x depth overall, so 15 per chromosome/allele/haplotype. If you want larger indels, then I think assembly is the only reliable way to call them (you will have to adjust for bias here, as I write/maintain a variant assembler). You should have plenty of coverage to do it in this situation. There are a few options

  1. my one - Cortex - dump the reads as fastq, and then pass it to Cortex. You will get better results I think if you do the whole genome, as you avoid errors in the mapper, but that will involve ~100Gb of RAM. You can jojntly assemble the entire trio and directly compare their genomes.

  2. Jared Simpson's SGA assembler also calls variants

  3. Heng Li's fermi assembler can also call variants - recently published in Bioinformatics I think.

I only have experience of option 1 I'm afraid, but I'm sure I, Jared and Heng would be happy to field further questions about details.

ADD COMMENT
1
Entering edit mode

Is Cortex suitable for exome data?

ADD REPLY

Login before adding your answer.

Traffic: 2029 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6