Question: Counting soft clipped bases and reads
0
gravatar for finswimmer
2.6 years ago by
finswimmer14k
Germany
finswimmer14k wrote:

Hello,

I'm looking for a way to count the number of soft clipped bases and reads in my bam file and calculate their fraction. What's the best way to do it?

Thanks a lot.

fin swimmer

bam • 1.8k views
ADD COMMENTlink modified 2.0 years ago by Biostar ♦♦ 20 • written 2.6 years ago by finswimmer14k
3
gravatar for Pierre Lindenbaum
2.6 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum131k wrote:

using bioalcidaejdk : http://lindenb.github.io/jvarkit/BioAlcidaeJdk.html

and the following code

final long counts[]=new long[4];
stream().forEach(R->{
    counts[0]++;
    if(R.getReadUnmappedFlag()) return;
    Cigar cigar = R.getCigar();
    if(cigar==null || cigar.isEmpty()) return;
    counts[1]++;
    counts[2] += cigar.getReadLength();
    counts[3] += cigar.
        getCigarElements().
        stream().
        filter(C->C.getOperator().isClipping()).
        mapToInt(C->C.getLength()).
        sum();
    });

println("NUM-READS:"+counts[0]);
println("NUM-MAPPED-READS:"+counts[1]);
println("SUM-MAPPED-READS-LENGTH:"+counts[2]);
println("SUM-CLIPPING:"+counts[3]);

usage/example:

$ java -jar dist/bioalcidaejdk.jar --nocode -f biostar.code src/test/resources/S1.bam 

NUM-READS:1998
NUM-MAPPED-READS:1998
SUM-MAPPED-READS-LENGTH:139860
SUM-CLIPPING:123
ADD COMMENTlink written 2.6 years ago by Pierre Lindenbaum131k

Hello Pierre,

"SUM-CLIPPING" is the sum of clipped bases? If so, how can I modify the code that I get the sum of clipped reads in addition?

fin swimmer

ADD REPLYlink written 2.6 years ago by finswimmer14k

that I get the sum of clipped reads in addition?

final long counts[]=new long[5];
(...)
if(cigar.isClipped()) counts[4]++;
(...)
println("NUM-READS-CLIPPED:"+counts[4]);
ADD REPLYlink written 2.6 years ago by Pierre Lindenbaum131k
1

Works very well.

Thanks a lot.

ADD REPLYlink written 2.6 years ago by finswimmer14k
3
gravatar for d-cameron
2.6 years ago by
d-cameron2.3k
Australia
d-cameron2.3k wrote:

The GRIDSS package contains a picard-like tool gridss.CollectCigarMetrics which will calculate and summarise the the frequency of each CIGAR element length and operation. Your fraction can then be trivially calculated using read count, and number of reads with 0 soft clip CIGAR elements.

ADD COMMENTlink modified 2.6 years ago by h.mon31k • written 2.6 years ago by d-cameron2.3k

Nice package. Is there a way to run just CollectCigarMetrics instead of the whole CallVariants pipeline?

fin swimmer

ADD REPLYlink written 2.6 years ago by finswimmer14k
1

I've designed it so every step can be run as a stand-alone program. Just run java -cp gridss.jar gridss.analysis.CollectCigarMetrics. example_pipeline.sh has an example of using CollectGridssMetrics but CollectCigarMetrics works the same way.

ADD REPLYlink modified 2.6 years ago • written 2.6 years ago by d-cameron2.3k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1145 users visited in the last hour