Question: Variant calling in genomic chunks
2
gravatar for Damian Kao
16 months ago by
Damian Kao15k
USA
Damian Kao15k wrote:

Is variant calling done on a per-position basis? I've read recommendations to split the BAM file by chromosomes and parallel call the chromosomes for speed

Could I further segment each chromosome into chunks and do calling on each chunk? For example if I: 1) Split a chromosome into 1MB segments. 2) Parallel variant call each 1MB segment. 3) Concatenate the VCF.

Would the resulting concatenated file be correct? Would I be missing any information that might be shared among sites on the same chromosome that variant callers use?

vcf variant calling • 540 views
ADD COMMENTlink modified 16 months ago by finswimmer12k • written 16 months ago by Damian Kao15k
2

If you plan on calling SNPs only then I don't see a problem. However, if you are looking for structural variants, there would be missing data on the edges of the chunks, especially for SVs spanning multiple chunks. Additionally, how do you plan to keep track of the size or exact position, additional liftUp files?

ADD REPLYlink written 16 months ago by Rohit1.4k

Thanks for the reply. I see what you mean with the SVs and possibly even indels. I am not interested in SVs for now, but do want to preserve indel information if I can.

The BAM files I am working with are low coverage. I guess I'll to write a script to chunk the BAM file based on coverage "islands" where each island should be at least 100bp apart.

ADD REPLYlink written 16 months ago by Damian Kao15k
1
gravatar for Devon Ryan
16 months ago by
Devon Ryan91k
Freiburg, Germany
Devon Ryan91k wrote:

Brad Chapman has described this procedure previously on his blog here (search for "Parallelism by genomic regions"). This at least used to be part of the bcbio-nextgen workflow, though I don't know if it's still included. That procedure is slightly more involved since it finds appropriate segments, rather then using 1MB chunks, but the principle is the same.

ADD COMMENTlink written 16 months ago by Devon Ryan91k
0
gravatar for finswimmer
16 months ago by
finswimmer12k
Germany
finswimmer12k wrote:

Hello Damian,

Is variant calling done on a per-position basis?

This depends on your variant caller. GATKS UnifiedGenotyper and samtools mpileup do so as far as I know. GATKs HaplotypeCaller and freebayes are doing local denovo assembly. So these variant caller need information around a suspected variant.

Instead of splitting the bam file, I would use the possibilty of most variant caller to do the calling within a given genomic region. So every process have than every alignment information it needs.

fin swimmer

ADD COMMENTlink written 16 months ago by finswimmer12k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1141 users visited in the last hour