Question: Variant calling in genomic chunks
2
gravatar for Damian Kao
11 months ago by
Damian Kao15k
USA
Damian Kao15k wrote:

Is variant calling done on a per-position basis? I've read recommendations to split the BAM file by chromosomes and parallel call the chromosomes for speed

Could I further segment each chromosome into chunks and do calling on each chunk? For example if I: 1) Split a chromosome into 1MB segments. 2) Parallel variant call each 1MB segment. 3) Concatenate the VCF.

Would the resulting concatenated file be correct? Would I be missing any information that might be shared among sites on the same chromosome that variant callers use?

vcf variant calling • 411 views
ADD COMMENTlink modified 11 months ago by finswimmer11k • written 11 months ago by Damian Kao15k
2

If you plan on calling SNPs only then I don't see a problem. However, if you are looking for structural variants, there would be missing data on the edges of the chunks, especially for SVs spanning multiple chunks. Additionally, how do you plan to keep track of the size or exact position, additional liftUp files?

ADD REPLYlink written 11 months ago by Rohit1.3k

Thanks for the reply. I see what you mean with the SVs and possibly even indels. I am not interested in SVs for now, but do want to preserve indel information if I can.

The BAM files I am working with are low coverage. I guess I'll to write a script to chunk the BAM file based on coverage "islands" where each island should be at least 100bp apart.

ADD REPLYlink written 11 months ago by Damian Kao15k
1
gravatar for Devon Ryan
11 months ago by
Devon Ryan88k
Freiburg, Germany
Devon Ryan88k wrote:

Brad Chapman has described this procedure previously on his blog here (search for "Parallelism by genomic regions"). This at least used to be part of the bcbio-nextgen workflow, though I don't know if it's still included. That procedure is slightly more involved since it finds appropriate segments, rather then using 1MB chunks, but the principle is the same.

ADD COMMENTlink written 11 months ago by Devon Ryan88k
0
gravatar for finswimmer
11 months ago by
finswimmer11k
Germany
finswimmer11k wrote:

Hello Damian,

Is variant calling done on a per-position basis?

This depends on your variant caller. GATKS UnifiedGenotyper and samtools mpileup do so as far as I know. GATKs HaplotypeCaller and freebayes are doing local denovo assembly. So these variant caller need information around a suspected variant.

Instead of splitting the bam file, I would use the possibilty of most variant caller to do the calling within a given genomic region. So every process have than every alignment information it needs.

fin swimmer

ADD COMMENTlink written 11 months ago by finswimmer11k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1134 users visited in the last hour