Question: Best practices for variant calling on multiple sequencing runs of the same sample
gravatar for Damian Kao
2.0 years ago by
Damian Kao15k
Damian Kao15k wrote:

Given a single DNA sample extracted from swab and these two cases:

  1. Prepared a single library and sequenced it 3 times separately

  2. Prepared 3 different libraries from the same DNA sample and sequenced each library

Both cases result in 3 sets of fastqs. Case 1 representing technical replicates for sequencing. Case 2 representing technical replicates for library prep.

If the goal is perform variant calling. How should I treat the fastqs from these two cases? Should I merge fastqs and then map/variant call? Should I keep them separate and somehow merge the gvcfs/vcfs? Are there variant calling methods/software that can take advantage of batch information?

How important is sequencing/biological batch information in terms of variant calling?

snps vcf variant calling • 768 views
ADD COMMENTlink modified 2.0 years ago by swbarnes28.9k • written 2.0 years ago by Damian Kao15k
gravatar for WouterDeCoster
2.0 years ago by
WouterDeCoster44k wrote:

Provided that library preparation is performed using the same kit and similar circumstances this should be sufficiently reproducible to not create any technical artefacts. Therefore it would be safe to merge the reads, either as fastq files or bam files after parallelized alignment.

For completeness sake, in case someone stumbles upon this post:

cat run1_R1.fastq.gz run2_R1.fastq.gz run3_R1.fastq.gz > merged_R1.fastq.gz

(and analogous for R2)

For merging bams:

samtools merge merged.bam run1.bam run2.bam run3.bam
ADD COMMENTlink written 2.0 years ago by WouterDeCoster44k
gravatar for swbarnes2
2.0 years ago by
United States
swbarnes28.9k wrote:

The Illumina protocol adds very little technical bias. Running the same same sample on three separate runs is not necessary, not for RNA, especially not for DNA.

For RNAseq, there might be batch differences between preps done on different days, but for DNA, this won't matter. If you prepped the three libraries side by side, they won't differ significantly. Neither your step one or step two is necessary. Different batches affect quantitative measurements, like in RNA seq, but shouldn't affect variant calling, unless your PCR duplication level is out of control, and you are trying to quantify allele frequencies.

You can concatenate (cat) the fastqs together prior to alignment (cat works on gzipped files fine), or samtools can merge the .bam files afterwards.

ADD COMMENTlink written 2.0 years ago by swbarnes28.9k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1332 users visited in the last hour