Hello Friends,
i am having around 108 fastq files(Paired ends) with 3 technical replicates each sample. now i am really confused to analyse these files.I am following the following procedure 1. Concatenate the samples (tech replicates) into two fastq files, i.e. one for a forward and another for a reveres seq.(samp1_1.fq.gz and samp1_2.fq.gz ...........samp108_1.fq.ga & samp108_2.fq.gz)
- Alignment – Map to Reference genome..> bwa mem -M ref input_1 input_2 > aligned_reads.sam
Now i am having issues in this step, should i individually map all 108 samples and gather the aligned_reads1.sam to aligned_reads108.sam and merge to Sort SAM file by coordinate and convert to BAM? or when should i merge these files?
I may be wrong but right now i am literally confused . If you guys have a script where in i can run these samples in a go will be of a great help for me. I may sound stupid, but trust me i am clueless in this case.
Thanks a lot , David Emir
In case of using GATK He needs to add the @RG while aligning; so he needs to know it from the fastq or he can add it after aligning but he needs to extract it from fasta/q file or have it from serves provider