I have a query and i hope you will be able to guide me to understand better, Please note: I am a non Tech Guy struggling hard to understand Life science(Out of my own intrest). I have received fastq files from my friend which are from Re-sequencing experiment. To minimize the likelihood of systematic bias in sampling, 2 paired-end libraries with insert size of 500bp are prepared for all samples and are then subjected to whole-genome sequencing four lanes of each library, resulting in at least 30-fold haploid coverage for each sample. Raw image files are processed by Illumina pipeline for base calling with default parameters and the sequences of each individual are generated as 100-bp-paired-end reads. So, I have around 3-4 BAM files for each Sample (Which i am assuming as Technical replicates) of the data that I am trying to analyse so far using GATK pipeline. I have aligned each file independent of each other with BWA-MEM now. I am at a step where I have a doubt. Should I combine (Merge) all Bam files from each sample before running the next step of Collect Alignment & Insert Size Metrics step or should i process each BAM file separately and call variants at last?
Process walk through: 1. Mapping to reference --> using BWA-MEM --> Output: .SAM file 2. Sorting the SAM file by coordinate and converting it to BAM --> Output: BAM file 3. Collect Alignment & Insert Size Metrics 4. Mark Duplicates 5. Build BAM Index 6. Create Realignment Targets 7. Realign Indels 8. Call Variants
If i need to merge these BAM files what would be the best practice and at what step it should be merged? it would be great if you can share the commands as well. Sorry for long post.
Thanks a lot and a very happy new year David Emir