Hi all,
I am reading the m54113_160913_184949.subreads.bam
BAM file available here using samtools and trying to figure out a set of subreads in the BAM file that correspond to one particular reference sequence using their QNAMES. When I search for a unique subreads with the same id in QNAME field e.g., 10420342 is the id we are searching for in the BAM file and printing the QNAME and FLAG
sorted_c0.bam
is the sorted BAM file of m54113_160913_184949.subreads.bam BAM file
user@xyz:~/compbio$ samtools view -h sorted_c0.bam | awk '{print $1" "$2}' | grep "10420342"
m54113_160914_092411/10420342/679_3845 4
m54113_160914_092411/10420342/3891_9746 4
m54113_160914_092411/10420342/9791_15590 4
m54113_160914_092411/10420342/15637_21533 4
m54113_160914_092411/10420342/21578_27410 4
m54113_160914_092411/10420342/27456_33664 4
m54113_160914_092411/10420342/33712_37327 4
user@xyz:~/compbio$ samtools view -h sorted_c0.bam | awk '{print $1" "$2}' | grep "4325913"
m54113_160914_092411/4325913/0_23608 4
m54113_160914_092411/4325913/23655_31537 4
Here, if we observe the output, all the flags are set to 4. According to our understanding, we believed there would be reverse complemented strands also present in the BAM file. However, we only saw the flag set to 4 for all subreads. In this pdf, we see that the flag 16 0x10 SEQ being reverse complemented. We don't see any other flag other than 4.
Further, if there are reverse complements, there should be an even number of subreads corresponding to an give QNAME ID which is not the case as we observe an odd number of subreads as shown in the above output (7 subreads in the first case).
We aim at generating the consensus for each bunch of subreads corresponding to a specific unique ID. Could someone verify if the PACBIO data contains reverse complimented strands. If yes, how to identify those?
Thanks!