I have four fastq file that correspond two paired end lane (lane1: L1R1.fastq, L1R2.fastq and lane2:L2R1,L2R2). Mapping to reference performed using below command via “bwa mem”.
./bwa mem –M –t 4 ref.fasta L1R1.fq L1R2.fq > D1.sam ./bwa mem –M –t 4 ref.fasta L2R1.fq L2R2.fq > D2.sam
Now I have 2 sam files and my final goal is variant calling (SNP, indel) and variant annotation for my non model organism.
- I want to convert sam to bam (and sorting) using Picard or samtools. Which mentioned programs do you recommend? Sorting is need in this step?
- I want to define read group for two bam files separately (using Picard) and then merg them to one bam file (big.bam).
- sorting big.bam file. Sorting is need if I sorted two bam files (before merging) in step 1?
- marking duplicates using Picard tool.
- building bam index and then Create Realignment Targets using GATK and finally variant calling.
Mentioned workflow is standard and correct way to reaching to the aim? in my workflow, definition Read Group and merging bam files done in right steps? I read the workflow https://gencore.bio.nyu.edu/variant-calling-pipeline/, but it worked with one bam file without need to definition Read Group.
Additional questions : D1.sam (147 G) and D2 (145 G) are big files and merging them will create a very large file that handling it is hard in my opinion. GATK and Picard can handle it in 32G RAM computer?