Question: workflow for variant calling using GATK in multiple paired end reads
gravatar for reza
14 months ago by
reza180 wrote:

I have four fastq file that correspond two paired end lane (lane1: L1R1.fastq, L1R2.fastq and lane2:L2R1,L2R2). Mapping to reference performed using below command via “bwa mem”.

./bwa mem –M –t 4 ref.fasta L1R1.fq L1R2.fq > D1.sam

./bwa mem –M –t 4 ref.fasta L2R1.fq L2R2.fq > D2.sam

Now I have 2 sam files and my final goal is variant calling (SNP, indel) and variant annotation for my non model organism.

  1. I want to convert sam to bam (and sorting) using Picard or samtools. Which mentioned programs do you recommend? Sorting is need in this step?
  2. I want to define read group for two bam files separately (using Picard) and then merg them to one bam file (big.bam).
  3. sorting big.bam file. Sorting is need if I sorted two bam files (before merging) in step 1?
  4. marking duplicates using Picard tool.
  5. building bam index and then Create Realignment Targets using GATK and finally variant calling.

Mentioned workflow is standard and correct way to reaching to the aim? in my workflow, definition Read Group and merging bam files done in right steps? I read the workflow, but it worked with one bam file without need to definition Read Group.

Additional questions : D1.sam (147 G) and D2 (145 G) are big files and merging them will create a very large file that handling it is hard in my opinion. GATK and Picard can handle it in 32G RAM computer?

snp next-gen • 757 views
ADD COMMENTlink modified 14 months ago by geek_y8.3k • written 14 months ago by reza180

Alternatively, you can merge the fastq files per read direction. Try to work with bam files instead of sam files, those take far less space.

ADD REPLYlink written 14 months ago by WouterDeCoster26k

i added two different RG to 2 bam files correspondin to one sample and then merged them in one bam file (Using Picard), but size of merged bam file (70 G) is less than sum of two original bam files (bam1= 48 G and bam2= 49 G). why? everything is right?

ADD REPLYlink written 14 months ago by reza180

That's possible. BAM is compressed, merging bams with similar sequences can lead to better compression, especially if sorted.

ADD REPLYlink written 14 months ago by WouterDeCoster26k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1160 users visited in the last hour