Question: How to efficiently remove a list of reads according to their ID and mapped position from BAM file?
0
gravatar for MarVi
16 months ago by
MarVi10
MarVi10 wrote:

Hello there,

I have a little question, I know that this is very similar to this question already answered: How to efficiently remove a list of reads from BAM file? But, somehow is a slight different. In fact, I have used Picard before to remove the unwanted reads from my bamfile. Now, I want to remove reads not only taking into account their ids, but also their mapped position (HI:I:).

I have some multi-mapped reads and I want to remove only those according to their position. I can't use Picard, because this is going to remove all the multi-mapped reads even those that I want to keep.

So for exemple, I have:

SRR4293695.162674600    272 chr1    10538   0   30M1S   *   0   0   CCACCGAAATCTGTGCAGAGGAGAACGCAGA FFFFAFFFFFFF7<FFFFAFFFFFFF<AAAA NH:i:8  HI:i:2  AS:i:27 nM:i:1
SRR4293695.162674600    16  chr1    181068  0   30M1S   *   0   0   CCACCGAAATCTGTGCAGAGGAGAACGCAGA FFFFAFFFFFFF7<FFFFAFFFFFFF<AAAA NH:i:8  HI:i:1  AS:i:27 nM:i:1
SRR4293695.162674600    272 chr9    10896   0   30M1S   *   0   0   CCACCGAAATCTGTGCAGAGGAGAACGCAGA FFFFAFFFFFFF7<FFFFAFFFFFFF<AAAA NH:i:8  HI:i:7  AS:i:27 nM:i:1
SRR4293695.162674600    272 chr12   10655   0   30M1S   *   0   0   CCACCGAAATCTGTGCAGAGGAGAACGCAGA FFFFAFFFFFFF7<FFFFAFFFFFFF<AAAA NH:i:8  HI:i:3  AS:i:27 nM:i:1
SRR4293695.162674600    256 chr15   101980314   0   1S30M   *   0   0   TCTGCGTTCTCCTCTGCACAGATTTCGGTGG AAAA<FFFFFFFAFFFF<7FFFFFFFAFFFF NH:i:8  HI:i:4  AS:i:27 nM:i:1
SRR4293695.162674600    272 chr16   10479   0   30M1S   *   0   0   CCACCGAAATCTGTGCAGAGGAGAACGCAGA FFFFAFFFFFFF7<FFFFAFFFFFFF<AAAA NH:i:8  HI:i:8  AS:i:27 nM:i:1
SRR4293695.162674600    256 chrX    156029433   0   1S30M   *   0   0   TCTGCGTTCTCCTCTGCACAGATTTCGGTGG AAAA<FFFFFFFAFFFF<7FFFFFFFAFFFF NH:i:8  HI:i:6  AS:i:27 nM:i:1
SRR4293695.162674600    256 chrY    57215953    0   1S30M   *   0   0   TCTGCGTTCTCCTCTGCACAGATTTCGGTGG AAAA<FFFFFFFAFFFF<7FFFFFFFAFFFF NH:i:8  HI:i:5  AS:i:27 nM:i:1

I would like to remove from the bam file the first two alignments in chromosome 1 with HI:i:1 and HI:i:2.

SRR4293695.162674600    272 chr9    10896   0   30M1S   *   0   0   CCACCGAAATCTGTGCAGAGGAGAACGCAGA FFFFAFFFFFFF7<FFFFAFFFFFFF<AAAA NH:i:8  HI:i:7  AS:i:27 nM:i:1
SRR4293695.162674600    272 chr12   10655   0   30M1S   *   0   0   CCACCGAAATCTGTGCAGAGGAGAACGCAGA FFFFAFFFFFFF7<FFFFAFFFFFFF<AAAA NH:i:8  HI:i:3  AS:i:27 nM:i:1
SRR4293695.162674600    256 chr15   101980314   0   1S30M   *   0   0   TCTGCGTTCTCCTCTGCACAGATTTCGGTGG AAAA<FFFFFFFAFFFF<7FFFFFFFAFFFF NH:i:8  HI:i:4  AS:i:27 nM:i:1
SRR4293695.162674600    272 chr16   10479   0   30M1S   *   0   0   CCACCGAAATCTGTGCAGAGGAGAACGCAGA FFFFAFFFFFFF7<FFFFAFFFFFFF<AAAA NH:i:8  HI:i:8  AS:i:27 nM:i:1
SRR4293695.162674600    256 chrX    156029433   0   1S30M   *   0   0   TCTGCGTTCTCCTCTGCACAGATTTCGGTGG AAAA<FFFFFFFAFFFF<7FFFFFFFAFFFF NH:i:8  HI:i:6  AS:i:27 nM:i:1
SRR4293695.162674600    256 chrY    57215953    0   1S30M   *   0   0   TCTGCGTTCTCCTCTGCACAGATTTCGGTGG AAAA<FFFFFFFAFFFF<7FFFFFFFAFFFF NH:i:8  HI:i:5  AS:i:27 nM:i:1

Does anyone have any suggestion? I could use python to do it by myself and then convert sam to bam file, but maybe there is something more optimized to try.

Thanks in advance!

bam rna-seq samtools alignment • 422 views
ADD COMMENTlink modified 16 months ago by Pierre Lindenbaum124k • written 16 months ago by MarVi10

Hello MarVi,

take a look at bamutils filter.

fin swimmer

ADD REPLYlink modified 16 months ago • written 16 months ago by finswimmer12k

but also their mapped position (NH:I:).

I have some multi-mapped reads and I want to remove only those according to their position.

i don't understand. Provide an example please.

ADD REPLYlink written 16 months ago by Pierre Lindenbaum124k

Hello, thank you very much for your responses.

Sure, here an example :

SRR4293695.162674600    272 chr1    10538   0   30M1S   *   0   0   CCACCGAAATCTGTGCAGAGGAGAACGCAGA FFFFAFFFFFFF7<FFFFAFFFFFFF<AAAA NH:i:8  HI:i:2  AS:i:27 nM:i:1
SRR4293695.162674600    16  chr1    181068  0   30M1S   *   0   0   CCACCGAAATCTGTGCAGAGGAGAACGCAGA FFFFAFFFFFFF7<FFFFAFFFFFFF<AAAA NH:i:8  HI:i:1  AS:i:27 nM:i:1
SRR4293695.162674600    272 chr9    10896   0   30M1S   *   0   0   CCACCGAAATCTGTGCAGAGGAGAACGCAGA FFFFAFFFFFFF7<FFFFAFFFFFFF<AAAA NH:i:8  HI:i:7  AS:i:27 nM:i:1
SRR4293695.162674600    272 chr12   10655   0   30M1S   *   0   0   CCACCGAAATCTGTGCAGAGGAGAACGCAGA FFFFAFFFFFFF7<FFFFAFFFFFFF<AAAA NH:i:8  HI:i:3  AS:i:27 nM:i:1
SRR4293695.162674600    256 chr15   101980314   0   1S30M   *   0   0   TCTGCGTTCTCCTCTGCACAGATTTCGGTGG AAAA<FFFFFFFAFFFF<7FFFFFFFAFFFF NH:i:8  HI:i:4  AS:i:27 nM:i:1
SRR4293695.162674600    272 chr16   10479   0   30M1S   *   0   0   CCACCGAAATCTGTGCAGAGGAGAACGCAGA FFFFAFFFFFFF7<FFFFAFFFFFFF<AAAA NH:i:8  HI:i:8  AS:i:27 nM:i:1
SRR4293695.162674600    256 chrX    156029433   0   1S30M   *   0   0   TCTGCGTTCTCCTCTGCACAGATTTCGGTGG AAAA<FFFFFFFAFFFF<7FFFFFFFAFFFF NH:i:8  HI:i:6  AS:i:27 nM:i:1
SRR4293695.162674600    256 chrY    57215953    0   1S30M   *   0   0   TCTGCGTTCTCCTCTGCACAGATTTCGGTGG AAAA<FFFFFFFAFFFF<7FFFFFFFAFFFF NH:i:8  HI:i:5  AS:i:27 nM:i:1

I would like remove the first two alignments that are in the first chromosome. At the end in my bam file I would like to have :

SRR4293695.162674600    272 chr9    10896   0   30M1S   *   0   0   CCACCGAAATCTGTGCAGAGGAGAACGCAGA FFFFAFFFFFFF7<FFFFAFFFFFFF<AAAA NH:i:8  HI:i:7  AS:i:27 nM:i:1
SRR4293695.162674600    272 chr12   10655   0   30M1S   *   0   0   CCACCGAAATCTGTGCAGAGGAGAACGCAGA FFFFAFFFFFFF7<FFFFAFFFFFFF<AAAA NH:i:8  HI:i:3  AS:i:27 nM:i:1
SRR4293695.162674600    256 chr15   101980314   0   1S30M   *   0   0   TCTGCGTTCTCCTCTGCACAGATTTCGGTGG AAAA<FFFFFFFAFFFF<7FFFFFFFAFFFF NH:i:8  HI:i:4  AS:i:27 nM:i:1
SRR4293695.162674600    272 chr16   10479   0   30M1S   *   0   0   CCACCGAAATCTGTGCAGAGGAGAACGCAGA FFFFAFFFFFFF7<FFFFAFFFFFFF<AAAA NH:i:8  HI:i:8  AS:i:27 nM:i:1
SRR4293695.162674600    256 chrX    156029433   0   1S30M   *   0   0   TCTGCGTTCTCCTCTGCACAGATTTCGGTGG AAAA<FFFFFFFAFFFF<7FFFFFFFAFFFF NH:i:8  HI:i:6  AS:i:27 nM:i:1
SRR4293695.162674600    256 chrY    57215953    0   1S30M   *   0   0   TCTGCGTTCTCCTCTGCACAGATTTCGGTGG AAAA<FFFFFFFAFFFF<7FFFFFFFAFFFF NH:i:8  HI:i:5  AS:i:27 nM:i:1

Thank you very much for your suggestions or ideas. I am going to check bamutils filter.

MarVi

ADD REPLYlink modified 16 months ago • written 16 months ago by MarVi10
0
gravatar for Pierre Lindenbaum
16 months ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum124k wrote:

using samdjdk in pair mode http://lindenb.github.io/jvarkit/SamJdk.html

 java -jar picard.jar SortSam I=coord.bam O=query.bam SO=queryname

java -jar  dist/samjdk.jar --pair -e 'return records.stream().filter(R->{Integer i=R.getIntegerAttribute("HI"); return i!=null && i>2;}).collect(Collectors.toList());' query.bam
ADD COMMENTlink written 16 months ago by Pierre Lindenbaum124k

Hello Pierre! Thanks for your answer. Nevertheless, is not the answer that I am looking for.

In fact, I don't want to keep only those reads that have HI:I > 2. I have a list with reads names associated with their alignment (chr:start) and HI:I. According to this list I want to remove those reads from my bamfile.

Above I just wrote an example. I feel sorry that it was not clear.

MarVi

ADD REPLYlink written 16 months ago by MarVi10

Then That is still not clear to me. Why do you remove the two first one and not the other ? Furthermore, why are you doing this: in your example you're removing the primary alignment and all the MAQP==0 ....

ADD REPLYlink written 16 months ago by Pierre Lindenbaum124k

My example is not a real example and not always are the two first. Removal is based in their positions (chr:start-end).

ADD REPLYlink written 16 months ago by MarVi10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1700 users visited in the last hour