Question: workflow for variant calling using GATK in multiple paired end reads
1
gravatar for reza
29 days ago by
reza80
Iran
reza80 wrote:

I have four fastq file that correspond two paired end lane (lane1: L1R1.fastq, L1R2.fastq and lane2:L2R1,L2R2). Mapping to reference performed using below command via “bwa mem”.

./bwa mem –M –t 4 ref.fasta L1R1.fq L1R2.fq > D1.sam

./bwa mem –M –t 4 ref.fasta L2R1.fq L2R2.fq > D2.sam

Now I have 2 sam files and my final goal is variant calling (SNP, indel) and variant annotation for my non model organism.

  1. I want to convert sam to bam (and sorting) using Picard or samtools. Which mentioned programs do you recommend? Sorting is need in this step?
  2. I want to define read group for two bam files separately (using Picard) and then merg them to one bam file (big.bam).
  3. sorting big.bam file. Sorting is need if I sorted two bam files (before merging) in step 1?
  4. marking duplicates using Picard tool.
  5. building bam index and then Create Realignment Targets using GATK and finally variant calling.

Mentioned workflow is standard and correct way to reaching to the aim? in my workflow, definition Read Group and merging bam files done in right steps? I read the workflow https://gencore.bio.nyu.edu/variant-calling-pipeline/, but it worked with one bam file without need to definition Read Group.

Additional questions : D1.sam (147 G) and D2 (145 G) are big files and merging them will create a very large file that handling it is hard in my opinion. GATK and Picard can handle it in 32G RAM computer?

snp next-gen • 150 views
ADD COMMENTlink modified 29 days ago by Goutham Atla6.7k • written 29 days ago by reza80

Alternatively, you can merge the fastq files per read direction. Try to work with bam files instead of sam files, those take far less space.

ADD REPLYlink written 26 days ago by WouterDeCoster12k

i added two different RG to 2 bam files correspondin to one sample and then merged them in one bam file (Using Picard), but size of merged bam file (70 G) is less than sum of two original bam files (bam1= 48 G and bam2= 49 G). why? everything is right?

ADD REPLYlink written 24 days ago by reza80

That's possible. BAM is compressed, merging bams with similar sequences can lead to better compression, especially if sorted.

ADD REPLYlink written 24 days ago by WouterDeCoster12k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1505 users visited in the last hour