Choosing sam flags to extract all reads mapped to 5' -> 3' OR 3' -> 5' of reference sequence?
3
0
Entering edit mode
4.7 years ago
johnnytam100 ▴ 100

I am trying to extract all the reads which : 1) mapped to 5' -> 3' OR 3' -> 5' of reference sequence AND 2) paired but one of the reads is unmapped.

To meet requirement 2), I read from here or here that the sam flags are 73, 133, 89, 121, 165, 181, 101, 117, 153, 185, 69 and 137.

However, to meet requirement 1), out of the flags stated above, what is the simple method to determine whether the reads are mapped to 5' -> 3' OR 3' -> 5' of the reference sequence?

I have also read from slide 32 of here but was quite difficult when I start to consider combinations.

Thanks a lot!!!

sam bwa flag mapping • 2.3k views
0
Entering edit mode

Do you have any language you must use? Or it doesn't matter?

0
Entering edit mode

I work in a linux system so maybe it doesn't matter.

1
Entering edit mode
4.7 years ago

I don't think you can do this using a one-pass samtools.

using samjdk:

 java -jar dist/samjdk.jar -e 'return record.getReadPairedFlag() &&((record.getReadUnmappedFlag() && !record.getMateUnmappedFlag()) || (!record.getReadUnmappedFlag() && record.getMateUnmappedFlag()));' input.bam

0
Entering edit mode

How could I divide the reads into 5' -> 3' and 3' -> 5' direction using the script? Thank you so much!

1
Entering edit mode

BTW, thank you so much for sharing the slides, maybe that is one of the few explanations of the sam flags that visualize how the reads look like.

0
Entering edit mode

it's not clear if you want to divide the sam flags . Your question looks like ( (cond1 || cond2) && cond3)'.

0
Entering edit mode

Sorry for making it confusing. I am extracting 2 groups of reads : Group 1 : paired but one of the reads is unmapped AND mapped 5'->3' of reference. Group 2 : paired but one of the reads is unmapped AND mapped 3'->5' of reference.

1
Entering edit mode

negative strand:

java -jar dist/samjdk.jar -e 'return record.getReadPairedFlag() &&((record.getReadUnmappedFlag() && !record.getMateUnmappedFlag() && record.getMateNegativeStrandFlag()) || (!record.getReadUnmappedFlag() && record.getMateUnmappedFlag() && record.getReadNegativeStrandFlag()));' input.bam


positive strand:

java -jar dist/samjdk.jar -e 'return record.getReadPairedFlag() &&((record.getReadUnmappedFlag() && !record.getMateUnmappedFlag() && !record.getMateNegativeStrandFlag()) || (!record.getReadUnmappedFlag() && record.getMateUnmappedFlag() && !record.getReadNegativeStrandFlag()));' input.bam

0
Entering edit mode

Great! You are of great help!!! After I compiled samjdk according to the download and compile section here and tried to run the command of negative strand, the following error appeared: Error: Invalid or corrupt jarfile. Any idea on how to fix this? Thanks again.

0
Entering edit mode

0
Entering edit mode

Sorry for keeping you waiting. Here are the information :

Installation command :

git clone "https://github.com/lindenb/jvarkit.git"
cd jvarkit
make samjdk


Command to extract negative strand:

java -jar /data/tam/script/jvarkit/src/main/java/com/github/lindenb/jvarkit/tools/samjs/SamJdk.java


Java version:

openjdk version "1.8.0_131"
OpenJDK Runtime Environment (build 1.8.0_131-b11)
OpenJDK 64-Bit Server VM (build 25.131-b11, mixed mode)

0
Entering edit mode
4.7 years ago
glihm ▴ 640

I strongly recommend you to read the documentation of SAMTOOLS and associated HTSlib. They are very powerful tools and you must understand them to go further in the SAM formatted files processing.

Solution 1: You have some time and you can make a list of all the flags you are interested in (You can use the tools you mentioned and also Picard Tool to explain SAM flags). So, using samtools view -F YOURLISTOFFLAGS input.bam you'll be able to extract the reads you want for.

Solution 2: You can use HTSlib in C (or in other languages like Python with Pysam or Java with samjdk mentioned by Pierre Lindenbaum). Is so, you have to focus your work on the BITWISE operators. They allow you a quicker and better way to acces all the flags you want to extract your reads of interest.

At a glance for the solution 2, if you don't care about the strandness (you want 5'->3' and 3'->5'), you don't have to test the strandness bit (0x10). So, the work is now simple, you just have to test if the read is paired (test on the bit 0x1) and if one of the two read is unmapped (bit 0x4 OR 0x8). You've done. :)

0
Entering edit mode

Actually I care about strandness, maybe I haven't stated it clearly enough in the question. :) So do you mean reads with strandness bit (0x10) is one direction, and without strandness bit (0x10) is another direction?

0
Entering edit mode

Correct, if 0x10 is set (1), read is on reverse strand. Else (0), read is on forward strand. ;)

0
Entering edit mode
4.7 years ago
GenoMax 117k

Using splitsam.sh from BBMap suite.

splitsam.sh mapped.sam forward.sam reverse.sam
reformat.sh in=forward.sam out=plus.fq
reformat.sh in=reverse.sam out=minus.fq rcomp

0
Entering edit mode

Thank you very much! Actually I only have one .sam per library mapped, do I have to divide them into separate .sam before using splitsam.sh?

0
Entering edit mode

Splitsam is doing the split for you into two separate sam files. You can then convert them back to fastq reads by the reformat command above.

0
Entering edit mode

So Splitsam will output forward.sam and reverse.sam. How about the plus.sam and minus.sam`? Are they also input files? Thank you!

0
Entering edit mode

Post edited above.