Hi All, I have generated a bed file from a BAM file using bamtobed commands from BedTools. I am not sure how to interpret the meaning of each column. There is also bed12 option which generates 12 columns output. Appreciate if anyone can help me out to understand what those are. Thank you.
From the manual, http://bedtools.readthedocs.org/en/latest/content/tools/bamtobed.html
Default behavior
By default, each alignment in the BAM file is converted to a 6 column BED. The BED “name” field is comprised of the RNAME field in the BAM alignment. If mate information is available, the mate (e.g., “/1” or “/2”) field will be appended to the name.
$ bedtools bamtobed -i reads.bam | head -3 chr7 118970079 118970129 TUPAC_0001:3:1:0:1452#0/1 37 - chr7 118965072 118965122 TUPAC_0001:3:1:0:1452#0/2 37 + chr11 46769934 46769984 TUPAC_0001:3:1:0:1472#0/1 37 -
If you are still confused, read about the SAM format https://samtools.github.io/hts-specs/SAMv1.pdf
@Sukhdeep Singh Thanks for the information. What I don't understand is the 5th column. Some of number are zero which I am not sure what it means. Does it mean the coverage number of the reads at particular reference position?
Hey , thats the MAPQ (mapping quality) score, it has a range of [0,2^8 -1]. o represents either the unmapped read or the unvailability of mapping quality. Check the percentage of how many reads in the file are like this.
awk '$5=="0"' file | wc -l
If you want more filterting check the post
How To Filter Mapped Reads With Samtools