Question: Choosing sam flags to extract all reads mapped to 5' -> 3' OR 3' -> 5' of reference sequence?
0
gravatar for johnnytam100
8 months ago by
johnnytam10040
johnnytam10040 wrote:

I am trying to extract all the reads which : 1) mapped to 5' -> 3' OR 3' -> 5' of reference sequence AND 2) paired but one of the reads is unmapped.

To meet requirement 2), I read from here or here that the sam flags are 73, 133, 89, 121, 165, 181, 101, 117, 153, 185, 69 and 137.

However, to meet requirement 1), out of the flags stated above, what is the simple method to determine whether the reads are mapped to 5' -> 3' OR 3' -> 5' of the reference sequence?

I have also read from slide 32 of here but was quite difficult when I start to consider combinations.

Thanks a lot!!!

mapping bwa sam flag • 487 views
ADD COMMENTlink modified 8 months ago by genomax49k • written 8 months ago by johnnytam10040

Do you have any language you must use? Or it doesn't matter?

ADD REPLYlink written 8 months ago by glihm560

I work in a linux system so maybe it doesn't matter.

ADD REPLYlink written 8 months ago by johnnytam10040
1
gravatar for Pierre Lindenbaum
8 months ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum108k wrote:

I don't think you can do this using a one-pass samtools.

using samjdk:

 java -jar dist/samjdk.jar -e 'return record.getReadPairedFlag() &&((record.getReadUnmappedFlag() && !record.getMateUnmappedFlag()) || (!record.getReadUnmappedFlag() && record.getMateUnmappedFlag()));' input.bam
ADD COMMENTlink written 8 months ago by Pierre Lindenbaum108k

How could I divide the reads into 5' -> 3' and 3' -> 5' direction using the script? Thank you so much!

ADD REPLYlink modified 8 months ago • written 8 months ago by johnnytam10040
1

BTW, thank you so much for sharing the slides, maybe that is one of the few explanations of the sam flags that visualize how the reads look like.

ADD REPLYlink written 8 months ago by johnnytam10040

it's not clear if you want to divide the sam flags . Your question looks like `( (cond1 || cond2) && cond3)'.

ADD REPLYlink written 8 months ago by Pierre Lindenbaum108k

Sorry for making it confusing. I am extracting 2 groups of reads : Group 1 : paired but one of the reads is unmapped AND mapped 5'->3' of reference. Group 2 : paired but one of the reads is unmapped AND mapped 3'->5' of reference.

ADD REPLYlink written 8 months ago by johnnytam10040
1

negative strand:

java -jar dist/samjdk.jar -e 'return record.getReadPairedFlag() &&((record.getReadUnmappedFlag() && !record.getMateUnmappedFlag() && record.getMateNegativeStrandFlag()) || (!record.getReadUnmappedFlag() && record.getMateUnmappedFlag() && record.getReadNegativeStrandFlag()));' input.bam

positive strand:

java -jar dist/samjdk.jar -e 'return record.getReadPairedFlag() &&((record.getReadUnmappedFlag() && !record.getMateUnmappedFlag() && !record.getMateNegativeStrandFlag()) || (!record.getReadUnmappedFlag() && record.getMateUnmappedFlag() && !record.getReadNegativeStrandFlag()));' input.bam
ADD REPLYlink written 8 months ago by Pierre Lindenbaum108k

Great! You are of great help!!! After I compiled samjdk according to the download and compile section here and tried to run the command of negative strand, the following error appeared: Error: Invalid or corrupt jarfile. Any idea on how to fix this? Thanks again.

ADD REPLYlink written 8 months ago by johnnytam10040

what was your exact command line please, and your java version.

ADD REPLYlink written 8 months ago by Pierre Lindenbaum108k

Sorry for keeping you waiting. Here are the information :

Installation command :

git clone "https://github.com/lindenb/jvarkit.git"
cd jvarkit
make samjdk

Command to extract negative strand:

java -jar /data/tam/script/jvarkit/src/main/java/com/github/lindenb/jvarkit/tools/samjs/SamJdk.java
-e 'return record.getReadPairedFlag() &&((record.getReadUnmappedFlag() && () && record.getMateNegativeStrandFlag()) || (() && record.getMateUnmappedFlag() && record.getReadNegativeStrandFlag()));' /data/tam/genome/mapping/20171001_3TTN_TTNR1_SPadesTTNcontigs_mapped_TTNTTNRoriginalreads_defaultmismatch/libextract-defaultmismatch_TTN_footprint_only_3TTN_TTNR1_read_list_aln.sorted.bam

Java version:

openjdk version "1.8.0_131"
OpenJDK Runtime Environment (build 1.8.0_131-b11)
OpenJDK 64-Bit Server VM (build 25.131-b11, mixed mode)
ADD REPLYlink modified 8 months ago • written 8 months ago by johnnytam10040
0
gravatar for glihm
8 months ago by
glihm560
France
glihm560 wrote:

I strongly recommend you to read the documentation of SAMTOOLS and associated HTSlib. They are very powerful tools and you must understand them to go further in the SAM formatted files processing.

Solution 1: You have some time and you can make a list of all the flags you are interested in (You can use the tools you mentioned and also Picard Tool to explain SAM flags). So, using samtools view -F YOURLISTOFFLAGS input.bam you'll be able to extract the reads you want for.

Solution 2: You can use HTSlib in C (or in other languages like Python with Pysam or Java with samjdk mentioned by Pierre Lindenbaum). Is so, you have to focus your work on the BITWISE operators. They allow you a quicker and better way to acces all the flags you want to extract your reads of interest.

At a glance for the solution 2, if you don't care about the strandness (you want 5'->3' and 3'->5'), you don't have to test the strandness bit (0x10). So, the work is now simple, you just have to test if the read is paired (test on the bit 0x1) and if one of the two read is unmapped (bit 0x4 OR 0x8). You've done. :)

ADD COMMENTlink written 8 months ago by glihm560

Actually I care about strandness, maybe I haven't stated it clearly enough in the question. :) So do you mean reads with strandness bit (0x10) is one direction, and without strandness bit (0x10) is another direction?

ADD REPLYlink written 8 months ago by johnnytam10040

Correct, if 0x10 is set (1), read is on reverse strand. Else (0), read is on forward strand. ;)

ADD REPLYlink modified 8 months ago • written 8 months ago by glihm560
0
gravatar for genomax
8 months ago by
genomax49k
United States
genomax49k wrote:

Using splitsam.sh from BBMap suite.

splitsam.sh mapped.sam forward.sam reverse.sam
reformat.sh in=forward.sam out=plus.fq
reformat.sh in=reverse.sam out=minus.fq rcomp
ADD COMMENTlink modified 8 months ago • written 8 months ago by genomax49k

Thank you very much! Actually I only have one .sam per library mapped, do I have to divide them into separate .sam before using splitsam.sh?

ADD REPLYlink written 8 months ago by johnnytam10040

Splitsam is doing the split for you into two separate sam files. You can then convert them back to fastq reads by the reformat command above.

ADD REPLYlink written 8 months ago by genomax49k

So Splitsam will output forward.sam and reverse.sam. How about the plus.sam and minus.sam? Are they also input files? Thank you!

ADD REPLYlink written 8 months ago by johnnytam10040

Post edited above.

ADD REPLYlink written 8 months ago by genomax49k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 850 users visited in the last hour