Question: workflow for variant calling using GATK in multiple paired end reads
1
gravatar for reza
8 weeks ago by
reza90
Iran
reza90 wrote:

I have four fastq file that correspond two paired end lane (lane1: L1R1.fastq, L1R2.fastq and lane2:L2R1,L2R2). Mapping to reference performed using below command via “bwa mem”.

./bwa mem –M –t 4 ref.fasta L1R1.fq L1R2.fq > D1.sam

./bwa mem –M –t 4 ref.fasta L2R1.fq L2R2.fq > D2.sam

Now I have 2 sam files and my final goal is variant calling (SNP, indel) and variant annotation for my non model organism.

  1. I want to convert sam to bam (and sorting) using Picard or samtools. Which mentioned programs do you recommend? Sorting is need in this step?
  2. I want to define read group for two bam files separately (using Picard) and then merg them to one bam file (big.bam).
  3. sorting big.bam file. Sorting is need if I sorted two bam files (before merging) in step 1?
  4. marking duplicates using Picard tool.
  5. building bam index and then Create Realignment Targets using GATK and finally variant calling.

Mentioned workflow is standard and correct way to reaching to the aim? in my workflow, definition Read Group and merging bam files done in right steps? I read the workflow https://gencore.bio.nyu.edu/variant-calling-pipeline/, but it worked with one bam file without need to definition Read Group.

Additional questions : D1.sam (147 G) and D2 (145 G) are big files and merging them will create a very large file that handling it is hard in my opinion. GATK and Picard can handle it in 32G RAM computer?

snp next-gen • 174 views
ADD COMMENTlink modified 8 weeks ago by Goutham Atla6.9k • written 8 weeks ago by reza90

Alternatively, you can merge the fastq files per read direction. Try to work with bam files instead of sam files, those take far less space.

ADD REPLYlink written 7 weeks ago by WouterDeCoster13k

i added two different RG to 2 bam files correspondin to one sample and then merged them in one bam file (Using Picard), but size of merged bam file (70 G) is less than sum of two original bam files (bam1= 48 G and bam2= 49 G). why? everything is right?

ADD REPLYlink written 7 weeks ago by reza90

That's possible. BAM is compressed, merging bams with similar sequences can lead to better compression, especially if sorted.

ADD REPLYlink written 7 weeks ago by WouterDeCoster13k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 867 users visited in the last hour