Question: In Sam Format, Clarify The Meaning Of The "0" Flag.
16
gravatar for tflutre
9.2 years ago by
tflutre510
tflutre510 wrote:

Hello, I have 40,637 short sequences (probes) in a fastq file named "seq.fq".

  • First, I mapped them against a reference genome (hg19) using BWA ("bwa aln ...").
  • Then, I converted the alignments from suffix-array coordinates into chromosomal coordinates ("bwa samse ..."), and obtained the results into a SAM file named "seq_aln.sam".
  • Finally, I counted the number of occurrences for each flag:

    $ grep -v "@" seq_aln.sam | awk -F"\t" 'BEGIN{print "flag\toccurrences"} {a[$2]++} END{for(i in a)print i"\t"a[i]}'

    flag......occurrences

    4.........3083

    0.........19039

    16.......18515

According to this page, the "4" flag means that the short sequence doesn't map onto the reference genome, and the "16" flag means that the short sequence does map on the reverse strand of the reference genome.

But, what does the "0" flag mean? According to this forum page, it means "the read is not paired and mapped, forward strand", which is unclear to me... Does it mean "it is not paired but it maps on forward strand"? Or "it is neither paired nor maps on forward strand"? Or "it is neither paired nor maps on any strand"?

At the end, does all this mean that I can work with only 18,515 short sequences out of 40,637?

Thanks for your help!

sam • 22k views
ADD COMMENTlink written 9.2 years ago by tflutre510
48
gravatar for Jts
9.2 years ago by
Jts1.3k
Jts1.3k wrote:

When the flag field is 0, it means none of the bitwise flags specified in the SAM spec (on page 4) are set. That means that your reads with flag 0 are unpaired (because the first flag, 0x1, is not set), successfully mapped to the reference (because 0x4 is not set) and mapped to the forward strand (because 0x10 is not set).

Summarizing your data, the reads with flag 4 are unmapped, the reads with flag 0 are mapped to the forward strand and the reads with flag 16 are mapped to the reverse strand.

ADD COMMENTlink written 9.2 years ago by Jts1.3k
9

Thanks a lot, your answer should be added to the description of the SAM man page...

ADD REPLYlink written 9.2 years ago by tflutre510
1

I agree with tflutre, this should be added to the SAM documentation. I work with influenza virus, a virus with an RNA genome in the negative-sense orientation. I expected my cDNA library would contain reads mapping to the reference sequence in both sense and antisense orientation, and I was getting flags of 0 and 16. The reads flagged as 0 certainly mapped places, but it would be nice to have some good documentation explaining this. Thanks!

ADD REPLYlink written 5.9 years ago by daphne.cooper10
1

@Jts, can you comment on Devon Ryan's comment in the answer to my question? I actually realize I agree with him on it that FLAG 0 only mean it's not reverse complemented, not necessarily mapped to forward/+ strand.

A flag field of 0 may or may not mean that a read is actually mapped to the + strand. All it means is that it wasn't reverse complemented. Whether that means it should be assigned to one strand or the other or none at all is dependent on other factors and isn't encoded in BAM files. As an example, most common single-end RNAseq experiments use a stranded protocol wherein alignments with a flag of 0 come from the "-" strand.

Link is here: SAM flag and select reads that map uniquely

ADD REPLYlink modified 3.9 years ago • written 3.9 years ago by epigene490

@Jts has not been seen on Biostars for 4+ years, just so you know ....

ADD REPLYlink written 3.9 years ago by genomax85k

@ genomax2, thanks for the note...

ADD REPLYlink written 3.9 years ago by epigene490
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 903 users visited in the last hour