Question: Extracting reads with low mapping score from Bam file
0
gravatar for devinliao0918
4.7 years ago by
United States
devinliao091820 wrote:

Is there a way to extract the reads with low mapping score from Bam files? Samtools will remove reads with low mapping score if we specify the threshold for option -q. However, what I want is to save those reads to another separate file instead of discarding them.

next-gen • 3.1k views
ADD COMMENTlink modified 4.7 years ago by Pierre Lindenbaum118k • written 4.7 years ago by devinliao091820
1
gravatar for Devon Ryan
4.7 years ago by
Devon Ryan88k
Freiburg, Germany
Devon Ryan88k wrote:

You have a few options:

  • Code up something simple with pysam
  • Just use awk (e.g., samtools view foo.bam | awk '{if($5<10) {print $0}}' > foo.low_qual.sam)
ADD COMMENTlink written 4.7 years ago by Devon Ryan88k

Thanks very much for your information.

Could I ask you a further question? In the foo.low_qual.sam file, there won't be headers from the original foo.bam file. If so, could I still use samtools to convert the foo.low_qual.sam file into foo.low_qual.bam in order to save space? Also, could I use samtools to generate pileup file from foo.low_qual.bam if there is no problem in converting .sam to .bam?

 

Thank you so much! 

ADD REPLYlink written 4.7 years ago by devinliao091820

That was just an abbreviated example. You could include headers too:

samtools view -h foo.bam | awk '{if(NF<5) {print $0} else if($5<10) {print $0}}' | samtools view -Sbo foo.low_qual.bam -

I should note that that's approximate, I haven't ensured that it lacks typos.

ADD REPLYlink modified 4.7 years ago • written 4.7 years ago by Devon Ryan88k

The NF<5 criterium does not work to extract the headers. We know that each line of the Sam file header starts with @ character, can I use this information to extract the headers? If so, could you write down the command? I googled but haven't found useful information.

ADD REPLYlink written 4.7 years ago by devinliao091820

Odd, that should normally suffice. You can also just:

samtools view -h foo.bam | awk '{if($1 ~ /^@/) {print $0} else if($5<10) {print $0}}' | samtools view -Sbo foo.low_qual.bam -
ADD REPLYlink written 4.7 years ago by Devon Ryan88k

shorter: awk '($0 ~ /^$@/ || int($5)<10)'

ADD REPLYlink modified 4.7 years ago • written 4.7 years ago by Pierre Lindenbaum118k

That actually works! Previously, I forgot to add the option -h.

Thanks a lot!

ADD REPLYlink written 4.7 years ago by devinliao091820

Hi Devon,

Could you explain what NF<5 means? Also, I think there is no difference using the two command line you provided. Is this correct? Obviously, I'd like to adopt the faster one.

ADD REPLYlink written 4.7 years ago by devinliao091820

NF is "the number of fields". Normally headers only have a small number of fields per-line.

ADD REPLYlink written 4.7 years ago by Devon Ryan88k
1
gravatar for Pierre Lindenbaum
4.7 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum118k wrote:

using my tool samjs : https://github.com/lindenb/jvarkit/wiki/SamJS

java -jar dist/samjs.jar -e 'record.getMappingQuality() > 20' \
   -X discarded.bam -o accepted.bam in.bam
ADD COMMENTlink written 4.7 years ago by Pierre Lindenbaum118k

Thank you, Pierre.

I would like to try your tool if the IT people in my school is willing to install your javakit for me.

ADD REPLYlink written 4.7 years ago by devinliao091820

Could I directly download your tool samjs.jar and run the command line you wrote? I checked that I could run java in my cluster. I actually did what you provide in the Download and Install, however, I couldn't run ant. I tried to find samjs.jar, but failed to find it.

Thanks,

ADD REPLYlink written 4.7 years ago by devinliao091820

What if I cannot use ant on the server? Could I use SamJavascript.java directly? I obtained SamJavascript.java from the github you provided in the link of compilation.  

 

ADD REPLYlink written 4.7 years ago by devinliao091820

install ant: http://ant.apache.org/bindownload.cgi

ADD REPLYlink written 4.7 years ago by Pierre Lindenbaum118k

I have installed ant and done the following steps:

$ git clone "https://github.com/lindenb/jvarkit.git"
$ cd jvarkit
$ git submodule init
$ git submodule update
$ cd htsjdk
$ ant clean
$ ant
$ cd ..
 

However, I couldn't find the samjs.jar using the command line "find . -name "samjs.jar"". Could you tell me where I could get it?

ADD REPLYlink written 4.7 years ago by devinliao091820

and then you have to `and samjs` https://github.com/lindenb/jvarkit/wiki/SamJS#compilation . The jar should be created in the 'dist' folder.

ADD REPLYlink written 4.7 years ago by Pierre Lindenbaum118k

Eventually, I get the samjs.jar on my Mac. I'm going to move it to the server so that I can use it there.

Thanks so much for your information!

ADD REPLYlink written 4.7 years ago by devinliao091820

The IT people in my school is unwilling to install ant on the server. Do you know other ways to compile and install your tool samjs?

By the way, is there other way to extract reads with low mapping scores in the Bam file? I don't want to convert the Bam file to Sam file first and then convert it back. I want to deal with the Bam files directly. Do you think that Bamtools would work?

ADD REPLYlink written 4.6 years ago by devinliao091820
The IT people in my school is unwilling to install ant on the server: Do you know other ways to compile and install your tool samjs: compile it at elsewhere. But you have to know java before moving the files to your school (and that's not my job, sorry).
ADD REPLYlink written 4.6 years ago by Pierre Lindenbaum118k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2136 users visited in the last hour