Samtools Select Alignments Bitwise Flag Value Of 0 Or 16 Only, Erase Contigs, Add Chr
3
1
Entering edit mode
10.2 years ago
dfernan ▴ 710

Hi all,

I was wondering what would be the fastest way to do something like the task described below in one line using samtools preferably:

Samtools one line instruction to do:

INPUT: infile.bam
The infile:

  • Contains other flags than 0 or 16 and I only want 0 or 16 in the output
  • Contains irrelevant chromosomes for my analysis such as GL000192.1, MT, etc., and I want to filter them out
  • Contains the chromosomes as numbers, i.e., 1,2,3,...,22,X,Y and I want them as chr1, chr2, chr3,...,chr22,chrX,chrY

OUTPUT outfile.bam with no irrelevant chromosomes, only 0 or 16 as flags, and chromosome as chr.

Suggestions are welcome, thanks guys!

samtools filter • 4.6k views
ADD COMMENT
1
Entering edit mode

I think my answer here would point you towards a solution. Let me know if you're unable to get it.

ADD REPLY
1
Entering edit mode
10.2 years ago

You can get the single mapped reads with

samtools view -F5

With the -L option, you can only get those chromosomes that match a .bed file, so you can put thsoe chromosomes in a bed file.

but samtools will not change the names of the chromosomes.

You could also try piping samtools view into awk or something, and it can filter, and change the names.

ADD COMMENT
0
Entering edit mode

hi, thanks. What does the -F 5 does? that does not work for selecting ONLY 0 or 16 flags... I tried it out and it didn't work, it also selects flags 512, 528. The bed chromosomes file sounds like a great suggestion.

ADD REPLY
2
Entering edit mode

What are you trying to accomplish with the flag filtering? I don't the filtering you'd like to have.

If you haven't seen it before, this page is handy for figuring out flag combinations:

http://picard.sourceforge.net/explain-flags.html

ADD REPLY
0
Entering edit mode

I am trying to eliminate duplicates, and reads that did not pass the vendor's quality, do you think keeping only flags 0 or 16 is appropriate? 0 stands for forward strand mapped, and 16 for reverse strand mapped so keeping 0 and 16 only should keep the single end forward and reverse strand mapped reads. Thanks!

ADD REPLY
1
Entering edit mode
10.2 years ago

Okay, then use -F517. Or -F1797, if your .bam file might have those flags.

ADD COMMENT
0
Entering edit mode

Sorry, just tried and none of them work: -F 517 selects ONLY 0 and 512, not 16; and -F 1797 selects 0, 16 but also 1024 and 1040. Not sure what is the logic but seems we are almos there but still more tweaking to do....

ADD REPLY
1
Entering edit mode

I'm confused because -F 1797 should kill reads with flags 1024 (duplicate) or 1040 (duplicate on reverse strand). Are you using the latest version of samtools? Capital F for the command line flag, right (just checking)?

To understand the logic I urge you to look at http://picard.sourceforge.net/explain-flags.html.

ADD REPLY
0
Entering edit mode

mystery solved it's working now, you are both right, sorry for the confusion. The post below is still correct but this way is faster. Thanks so much guys, you saved my sequencing processing day ;-)

ADD REPLY
0
Entering edit mode
10.2 years ago
dfernan ▴ 710

I think I find the solution but it does not use the flag commands in samtools (thanks to other post here).

samtools view -h -L hg19.bed dnd-brd4.bam | awk 'substr($0,1,1) == "@" || $2 == 0 || $2 == 16 {print}' | samtools view -bS - > F016-dnd-brd4.bam.

Thanks for all the help provided.

On a side note, It'd be nice to have a solution with the flag command since it's faster, but this tweak certainly works albeit much slower than using -F flags with samtools.

ADD COMMENT

Login before adding your answer.

Traffic: 868 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6