Is there a way to extract the reads with low mapping score from Bam files? Samtools will remove reads with low mapping score if we specify the threshold for option -q. However, what I want is to save those reads to another separate file instead of discarding them.
You have a few options:
- Code up something simple with pysam
- Just use awk (e.g.,
samtools view foo.bam | awk '{if($5<10) {print $0}}' > foo.low_qual.sam
)
Thanks very much for your information.
Could I ask you a further question? In the foo.low_qual.sam file, there won't be headers from the original foo.bam file. If so, could I still use samtools to convert the foo.low_qual.sam file into foo.low_qual.bam in order to save space? Also, could I use samtools to generate pileup file from foo.low_qual.bam if there is no problem in converting .sam to .bam?
Thank you so much!
That was just an abbreviated example. You could include headers too:
samtools view -h foo.bam | awk '{if(NF<5) {print $0} else if($5<10) {print $0}}' | samtools view -Sbo foo.low_qual.bam -
I should note that that's approximate, I haven't ensured that it lacks typos.
The NF<5 criterium does not work to extract the headers. We know that each line of the Sam file header starts with @ character, can I use this information to extract the headers? If so, could you write down the command? I googled but haven't found useful information.
Odd, that should normally suffice. You can also just:
samtools view -h foo.bam | awk '{if($1 ~ /^@/) {print $0} else if($5<10) {print $0}}' | samtools view -Sbo foo.low_qual.bam -
using my tool samjs : https://github.com/lindenb/jvarkit/wiki/SamJS
java -jar dist/samjs.jar -e 'record.getMappingQuality() > 20' \ -X discarded.bam -o accepted.bam in.bam
Thank you, Pierre.
I would like to try your tool if the IT people in my school is willing to install your javakit for me.
Could I directly download your tool samjs.jar and run the command line you wrote? I checked that I could run java in my cluster. I actually did what you provide in the Download and Install, however, I couldn't run ant. I tried to find samjs.jar, but failed to find it.
Thanks,
What if I cannot use ant on the server? Could I use SamJavascript.java directly? I obtained SamJavascript.java from the github you provided in the link of compilation.
install ant: http://ant.apache.org/bindownload.cgi
I have installed ant and done the following steps:
$ git clone "https://github.com/lindenb/jvarkit.git"
$ cd jvarkit
$ git submodule init
$ git submodule update
$ cd htsjdk
$ ant clean
$ ant
$ cd ..
However, I couldn't find the samjs.jar using the command line "find . -name "samjs.jar"". Could you tell me where I could get it?
and then you have to `and samjs` https://github.com/lindenb/jvarkit/wiki/SamJS#compilation . The jar should be created in the 'dist' folder.
Eventually, I get the samjs.jar on my Mac. I'm going to move it to the server so that I can use it there.
Thanks so much for your information!
The IT people in my school is unwilling to install ant on the server. Do you know other ways to compile and install your tool samjs?
By the way, is there other way to extract reads with low mapping scores in the Bam file? I don't want to convert the Bam file to Sam file first and then convert it back. I want to deal with the Bam files directly. Do you think that Bamtools would work?