Question: Choosing sam flags to extract all reads mapped to 5' -> 3' OR 3' -> 5' of reference sequence?
0
gravatar for johnnytam100
5 weeks ago by
johnnytam10020
johnnytam10020 wrote:

I am trying to extract all the reads which : 1) mapped to 5' -> 3' OR 3' -> 5' of reference sequence AND 2) paired but one of the reads is unmapped.

To meet requirement 2), I read from here or here that the sam flags are 73, 133, 89, 121, 165, 181, 101, 117, 153, 185, 69 and 137.

However, to meet requirement 1), out of the flags stated above, what is the simple method to determine whether the reads are mapped to 5' -> 3' OR 3' -> 5' of the reference sequence?

I have also read from slide 32 of here but was quite difficult when I start to consider combinations.

Thanks a lot!!!

mapping bwa sam flag • 262 views
ADD COMMENTlink modified 5 weeks ago by genomax37k • written 5 weeks ago by johnnytam10020

Do you have any language you must use? Or it doesn't matter?

ADD REPLYlink written 5 weeks ago by glihm530

I work in a linux system so maybe it doesn't matter.

ADD REPLYlink written 5 weeks ago by johnnytam10020
1
gravatar for Pierre Lindenbaum
5 weeks ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum101k wrote:

I don't think you can do this using a one-pass samtools.

using samjdk:

 java -jar dist/samjdk.jar -e 'return record.getReadPairedFlag() &&((record.getReadUnmappedFlag() && !record.getMateUnmappedFlag()) || (!record.getReadUnmappedFlag() && record.getMateUnmappedFlag()));' input.bam
ADD COMMENTlink written 5 weeks ago by Pierre Lindenbaum101k

How could I divide the reads into 5' -> 3' and 3' -> 5' direction using the script? Thank you so much!

ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by johnnytam10020
1

BTW, thank you so much for sharing the slides, maybe that is one of the few explanations of the sam flags that visualize how the reads look like.

ADD REPLYlink written 5 weeks ago by johnnytam10020

it's not clear if you want to divide the sam flags . Your question looks like `( (cond1 || cond2) && cond3)'.

ADD REPLYlink written 5 weeks ago by Pierre Lindenbaum101k

Sorry for making it confusing. I am extracting 2 groups of reads : Group 1 : paired but one of the reads is unmapped AND mapped 5'->3' of reference. Group 2 : paired but one of the reads is unmapped AND mapped 3'->5' of reference.

ADD REPLYlink written 5 weeks ago by johnnytam10020
1

negative strand:

java -jar dist/samjdk.jar -e 'return record.getReadPairedFlag() &&((record.getReadUnmappedFlag() && !record.getMateUnmappedFlag() && record.getMateNegativeStrandFlag()) || (!record.getReadUnmappedFlag() && record.getMateUnmappedFlag() && record.getReadNegativeStrandFlag()));' input.bam

positive strand:

java -jar dist/samjdk.jar -e 'return record.getReadPairedFlag() &&((record.getReadUnmappedFlag() && !record.getMateUnmappedFlag() && !record.getMateNegativeStrandFlag()) || (!record.getReadUnmappedFlag() && record.getMateUnmappedFlag() && !record.getReadNegativeStrandFlag()));' input.bam
ADD REPLYlink written 5 weeks ago by Pierre Lindenbaum101k

Great! You are of great help!!! After I compiled samjdk according to the download and compile section here and tried to run the command of negative strand, the following error appeared: Error: Invalid or corrupt jarfile. Any idea on how to fix this? Thanks again.

ADD REPLYlink written 5 weeks ago by johnnytam10020

what was your exact command line please, and your java version.

ADD REPLYlink written 4 weeks ago by Pierre Lindenbaum101k

Sorry for keeping you waiting. Here are the information :

Installation command :

git clone "https://github.com/lindenb/jvarkit.git"
cd jvarkit
make samjdk

Command to extract negative strand:

java -jar /data/tam/script/jvarkit/src/main/java/com/github/lindenb/jvarkit/tools/samjs/SamJdk.java
-e 'return record.getReadPairedFlag() &&((record.getReadUnmappedFlag() && () && record.getMateNegativeStrandFlag()) || (() && record.getMateUnmappedFlag() && record.getReadNegativeStrandFlag()));' /data/tam/genome/mapping/20171001_3TTN_TTNR1_SPadesTTNcontigs_mapped_TTNTTNRoriginalreads_defaultmismatch/libextract-defaultmismatch_TTN_footprint_only_3TTN_TTNR1_read_list_aln.sorted.bam

Java version:

openjdk version "1.8.0_131"
OpenJDK Runtime Environment (build 1.8.0_131-b11)
OpenJDK 64-Bit Server VM (build 25.131-b11, mixed mode)
ADD REPLYlink modified 4 weeks ago • written 4 weeks ago by johnnytam10020
0
gravatar for glihm
5 weeks ago by
glihm530
France
glihm530 wrote:

I strongly recommend you to read the documentation of SAMTOOLS and associated HTSlib. They are very powerful tools and you must understand them to go further in the SAM formatted files processing.

Solution 1: You have some time and you can make a list of all the flags you are interested in (You can use the tools you mentioned and also Picard Tool to explain SAM flags). So, using samtools view -F YOURLISTOFFLAGS input.bam you'll be able to extract the reads you want for.

Solution 2: You can use HTSlib in C (or in other languages like Python with Pysam or Java with samjdk mentioned by Pierre Lindenbaum). Is so, you have to focus your work on the BITWISE operators. They allow you a quicker and better way to acces all the flags you want to extract your reads of interest.

At a glance for the solution 2, if you don't care about the strandness (you want 5'->3' and 3'->5'), you don't have to test the strandness bit (0x10). So, the work is now simple, you just have to test if the read is paired (test on the bit 0x1) and if one of the two read is unmapped (bit 0x4 OR 0x8). You've done. :)

ADD COMMENTlink written 5 weeks ago by glihm530

Actually I care about strandness, maybe I haven't stated it clearly enough in the question. :) So do you mean reads with strandness bit (0x10) is one direction, and without strandness bit (0x10) is another direction?

ADD REPLYlink written 5 weeks ago by johnnytam10020

Correct, if 0x10 is set (1), read is on reverse strand. Else (0), read is on forward strand. ;)

ADD REPLYlink modified 4 weeks ago • written 4 weeks ago by glihm530
0
gravatar for genomax
5 weeks ago by
genomax37k
United States
genomax37k wrote:

Using splitsam.sh from BBMap suite.

splitsam.sh mapped.sam forward.sam reverse.sam
reformat.sh in=forward.sam out=plus.fq
reformat.sh in=reverse.sam out=minus.fq rcomp
ADD COMMENTlink modified 4 weeks ago • written 5 weeks ago by genomax37k

Thank you very much! Actually I only have one .sam per library mapped, do I have to divide them into separate .sam before using splitsam.sh?

ADD REPLYlink written 5 weeks ago by johnnytam10020

Splitsam is doing the split for you into two separate sam files. You can then convert them back to fastq reads by the reformat command above.

ADD REPLYlink written 4 weeks ago by genomax37k

So Splitsam will output forward.sam and reverse.sam. How about the plus.sam and minus.sam? Are they also input files? Thank you!

ADD REPLYlink written 4 weeks ago by johnnytam10020

Post edited above.

ADD REPLYlink written 4 weeks ago by genomax37k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 601 users visited in the last hour