Extracting reads with low mapping score from Bam file
2
0
Entering edit mode
8.2 years ago

Is there a way to extract the reads with low mapping score from Bam files? Samtools will remove reads with low mapping score if we specify the threshold for option -q. However, what I want is to save those reads to another separate file instead of discarding them.

next-gen • 5.7k views
2
Entering edit mode
8.2 years ago

You have a few options:

• Code up something simple with pysam
• Just use awk (e.g., samtools view foo.bam | awk '{if($5<10) {print$0}}' > foo.low_qual.sam)
0
Entering edit mode

Thanks very much for your information.

Could I ask you a further question? In the foo.low_qual.sam file, there won't be headers from the original foo.bam file. If so, could I still use samtools to convert the foo.low_qual.sam file into foo.low_qual.bam in order to save space? Also, could I use samtools to generate pileup file from foo.low_qual.bam if there is no problem in converting .sam to .bam?

Thank you so much!

0
Entering edit mode

That was just an abbreviated example. You could include headers too:

samtools view -h foo.bam | awk '{if(NF<5) {print $0} else if($5<10) {print $0}}' | samtools view -Sbo foo.low_qual.bam - I should note that that's approximate, I haven't ensured that it lacks typos. ADD REPLY 0 Entering edit mode The NF<5 criterium does not work to extract the headers. We know that each line of the Sam file header starts with @ character, can I use this information to extract the headers? If so, could you write down the command? I googled but haven't found useful information. ADD REPLY 0 Entering edit mode Odd, that should normally suffice. You can also just: samtools view -h foo.bam | awk '{if($1 ~ /^@/) {print $0} else if($5<10) {print $0}}' | samtools view -Sbo foo.low_qual.bam - ADD REPLY 0 Entering edit mode shorter: awk '($0 ~ /^$@/ || int($5)<10)'

0
Entering edit mode

That actually works! Previously, I forgot to add the option -h.

Thanks a lot!

0
Entering edit mode

Hi Devon,

Could you explain what NF<5 means? Also, I think there is no difference using the two command line you provided. Is this correct? Obviously, I'd like to adopt the faster one.

0
Entering edit mode

NF is "the number of fields". Normally headers only have a small number of fields per-line.

1
Entering edit mode
8.2 years ago

using my tool samjs : https://github.com/lindenb/jvarkit/wiki/SamJS

java -jar dist/samjs.jar -e 'record.getMappingQuality() > 20' \
-X discarded.bam -o accepted.bam in.bam
0
Entering edit mode

Thank you, Pierre.

I would like to try your tool if the IT people in my school is willing to install your javakit for me.

0
Entering edit mode

Could I directly download your tool samjs.jar and run the command line you wrote? I checked that I could run java in my cluster. I actually did what you provide in the Download and Install, however, I couldn't run ant. I tried to find samjs.jar, but failed to find it.

Thanks,

0
Entering edit mode

What if I cannot use ant on the server? Could I use SamJavascript.java directly? I obtained SamJavascript.java from the github you provided in the link of compilation.

0
Entering edit mode
0
Entering edit mode

I have installed ant and done the following steps:

git clone "https://github.com/lindenb/jvarkit.git"
cd jvarkit
git submodule init
git submodule update
cd htsjdk
ant clean
ant
cd ..


However, I couldn't find the samjs.jar using the command find . -name "samjs.jar". Could you tell me where I could get it?

0
Entering edit mode

and then you have to and samjs https://github.com/lindenb/jvarkit/wiki/SamJS#compilation . The jar should be created in the 'dist' folder.

0
Entering edit mode

Eventually, I get the samjs.jar on my Mac. I'm going to move it to the server so that I can use it there.

Thanks so much for your information!

0
Entering edit mode

The IT people in my school is unwilling to install ant on the server. Do you know other ways to compile and install your tool samjs?

By the way, is there other way to extract reads with low mapping scores in the Bam file? I don't want to convert the Bam file to Sam file first and then convert it back. I want to deal with the Bam files directly. Do you think that Bamtools would work?

0
Entering edit mode
The IT people in my school is unwilling to install ant on the server: Do you know other ways to compile and install your tool samjs: compile it at elsewhere. But you have to know java before moving the files to your school (and that's not my job, sorry).