Samtools Select Alignments Bitwise Flag Value Of 0 Or 16 Only, Erase Contigs, Add Chr
3
1
Entering edit mode
10.2 years ago
dfernan ▴ 710

Hi all,

I was wondering what would be the fastest way to do something like the task described below in one line using samtools preferably:

Samtools one line instruction to do:

INPUT: infile.bam
The infile:

• Contains other flags than 0 or 16 and I only want 0 or 16 in the output
• Contains irrelevant chromosomes for my analysis such as GL000192.1, MT, etc., and I want to filter them out
• Contains the chromosomes as numbers, i.e., 1,2,3,...,22,X,Y and I want them as chr1, chr2, chr3,...,chr22,chrX,chrY

OUTPUT outfile.bam with no irrelevant chromosomes, only 0 or 16 as flags, and chromosome as chr.

Suggestions are welcome, thanks guys!

samtools filter • 4.6k views
1
Entering edit mode

I think my answer here would point you towards a solution. Let me know if you're unable to get it.

1
Entering edit mode
10.2 years ago

You can get the single mapped reads with

samtools view -F5


With the -L option, you can only get those chromosomes that match a .bed file, so you can put thsoe chromosomes in a bed file.

but samtools will not change the names of the chromosomes.

You could also try piping samtools view into awk or something, and it can filter, and change the names.

0
Entering edit mode

hi, thanks. What does the -F 5 does? that does not work for selecting ONLY 0 or 16 flags... I tried it out and it didn't work, it also selects flags 512, 528. The bed chromosomes file sounds like a great suggestion.

2
Entering edit mode

What are you trying to accomplish with the flag filtering? I don't the filtering you'd like to have.

If you haven't seen it before, this page is handy for figuring out flag combinations:

http://picard.sourceforge.net/explain-flags.html

0
Entering edit mode

I am trying to eliminate duplicates, and reads that did not pass the vendor's quality, do you think keeping only flags 0 or 16 is appropriate? 0 stands for forward strand mapped, and 16 for reverse strand mapped so keeping 0 and 16 only should keep the single end forward and reverse strand mapped reads. Thanks!

1
Entering edit mode
10.2 years ago

Okay, then use -F517. Or -F1797, if your .bam file might have those flags.

0
Entering edit mode

Sorry, just tried and none of them work: -F 517 selects ONLY 0 and 512, not 16; and -F 1797 selects 0, 16 but also 1024 and 1040. Not sure what is the logic but seems we are almos there but still more tweaking to do....

1
Entering edit mode

I'm confused because -F 1797 should kill reads with flags 1024 (duplicate) or 1040 (duplicate on reverse strand). Are you using the latest version of samtools? Capital F for the command line flag, right (just checking)?

To understand the logic I urge you to look at http://picard.sourceforge.net/explain-flags.html.

0
Entering edit mode

mystery solved it's working now, you are both right, sorry for the confusion. The post below is still correct but this way is faster. Thanks so much guys, you saved my sequencing processing day ;-)

0
Entering edit mode
10.2 years ago
dfernan ▴ 710

I think I find the solution but it does not use the flag commands in samtools (thanks to other post here).

samtools view -h -L hg19.bed dnd-brd4.bam | awk 'substr($0,1,1) == "@" ||$2 == 0 || \$2 == 16 {print}' | samtools view -bS - > F016-dnd-brd4.bam.

Thanks for all the help provided.

On a side note, It'd be nice to have a solution with the flag command since it's faster, but this tweak certainly works albeit much slower than using -F flags with samtools.