I used BWA to align SOLiD mate pair reads (60,60) with parameters -n 8(total mismatch) -l 25 (seed) and -k 2 (mismatch in seed). I am getting a good mapping rate of around 65%.
BWA outputs all the reads disregard of whether they were mapped, unmapped, mapped in pairs and other bitwise flags. To solve this problem I converted my SAM file to BAM file. As I am not interested in inversions or some unusual variant I had to filter out the SAM file so that it can be used for high confidence SNP and Indel calling. Then I used:
samtools view -b -f 67 -f 31 -f 179 -f 115 old.bam > new.bam
67 and 31 (paired, mapped and properly paired) 179 and 115 (paired, mapped, properly mapped and both mapped reverse complimentary same strand)
Once I got the new.bam BAM, I sorted it and removed the duplicates usign samtools and then used mpileup to call for the SNPs and indels.
Below are my Yes or No questions:
1) This is my first time doing a NGS analysis. Am I doing things correctly? Is the order of steps I am performing correct? 2) As I only want to use high confidant reads I have filtered out all the unmapped, not properly paired reads. Do you think the flagwise bits I have used are correct.
Though I tried to remove the duplicates using samtools for my mate pair bam data but I can still see lot of mate-pair reads mapped to the same position as other mate-pair reads. Some people have suggested using Picard. I used trim 3'end option in BWA. The reads that were duplicates before may not remain duplicates afterwards because the length of some reads got changed after trimming. Can anyone tell me how to resolve this issue.