Question: Sorting samtools output by flag or grepping its tags
0
gravatar for BPors
3.2 years ago by
BPors40
BPors40 wrote:

Hi!

I am getting confused with couple samtools outputs. I have run bwa-mem with -M flag to retrieve chimeric reads with 256 flag, and converted the sam file to bam file. Problem is, in the original bam file, if I want to count the lines with 256 flag, and if I want to retrieve "SA" tags, the numbers do not add up. Should not they be equal if they are both to show chimeric reads?

Example:

samtools view -h myfile.bam | grep 'SA:Z' > myfileSA_filtered.bam 

samtools view -f 256 -c myfile.bam 
1987783

and

samtools view -c myfileSA_filtered.bam
300281

I expect these numbers to be the same but they are quite far away from each other. I would appreciate your help!

chimeric samtools grep flag • 1.8k views
ADD COMMENTlink modified 2.3 years ago by Biostar ♦♦ 20 • written 3.2 years ago by BPors40
1

I think the 256 flag only means that it is not a primary alignment. This can then be a chimeric one that has been tagged as 256 due to the -M option, but also a multimapping read. If reads multimap, then one is randomly chosen to be primary, all others get flagged as secondary. That means SA:Z is only a subset of 256, and therefore, the number for the grep is smaller than for counting 256.

ADD REPLYlink written 3.2 years ago by ATpoint40k

Yes, followed the same thought, but then I cut the QNAME part, sorted and counted the unique ones in 256flagged bam file. Then I counted the unique QNAMEs in myfileSA_filtered.bam, and the numbers were still different

ADD REPLYlink modified 3.2 years ago • written 3.2 years ago by BPors40

first: how do know there is no 'SA:Z' substring in the quality string ?

ADD REPLYlink written 3.2 years ago by Pierre Lindenbaum131k

Because I did not only use SA:Z, it also includes the accession number of the genome, therefore making it specific to its chimerics

ADD REPLYlink written 3.2 years ago by BPors40

second: 256 if for 'secondary alignment' = an alternative alignment for the current read. I think you want 2048 'supplementary alignement'

ADD REPLYlink written 3.2 years ago by Pierre Lindenbaum131k

well, I wanted to retrieve the chimeric reads, so I thought to grab the ones which have another place to be. I have used bwa-mem -M -a to get all alignments, but since I have used -M, as a flag, I used 256 instead of 2048. Would it be a wrong choice?

ADD REPLYlink written 3.2 years ago by BPors40
1

256 would be used for a read mapping a duplicated gene: a read could have a two possible position (primary + not-primary=256 alignement)

a first part of the read is mapped on a region while the other part is clipped. There is a second alignment (2048) for this second part, mapping another region of the genome.

ADD REPLYlink written 3.2 years ago by Pierre Lindenbaum131k
1
gravatar for jkbonfield
3.2 years ago by
jkbonfield400
jkbonfield400 wrote:

SA:Z: tags appear on the primary reads as well as supplementary ones, therefore the count is higher. Filtering by flags is the best approach. (Hint "samtools flags" give you a handy chart of the numbers.)

ADD COMMENTlink written 3.2 years ago by jkbonfield400
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 775 users visited in the last hour