Question: Counting soft clipped bases and reads
0
gravatar for finswimmer
23 months ago by
finswimmer13k
Germany
finswimmer13k wrote:

Hello,

I'm looking for a way to count the number of soft clipped bases and reads in my bam file and calculate their fraction. What's the best way to do it?

Thanks a lot.

fin swimmer

bam • 1.4k views
ADD COMMENTlink modified 16 months ago by Biostar ♦♦ 20 • written 23 months ago by finswimmer13k
3
gravatar for Pierre Lindenbaum
23 months ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum127k wrote:

using bioalcidaejdk : http://lindenb.github.io/jvarkit/BioAlcidaeJdk.html

and the following code

final long counts[]=new long[4];
stream().forEach(R->{
    counts[0]++;
    if(R.getReadUnmappedFlag()) return;
    Cigar cigar = R.getCigar();
    if(cigar==null || cigar.isEmpty()) return;
    counts[1]++;
    counts[2] += cigar.getReadLength();
    counts[3] += cigar.
        getCigarElements().
        stream().
        filter(C->C.getOperator().isClipping()).
        mapToInt(C->C.getLength()).
        sum();
    });

println("NUM-READS:"+counts[0]);
println("NUM-MAPPED-READS:"+counts[1]);
println("SUM-MAPPED-READS-LENGTH:"+counts[2]);
println("SUM-CLIPPING:"+counts[3]);

usage/example:

$ java -jar dist/bioalcidaejdk.jar --nocode -f biostar.code src/test/resources/S1.bam 

NUM-READS:1998
NUM-MAPPED-READS:1998
SUM-MAPPED-READS-LENGTH:139860
SUM-CLIPPING:123
ADD COMMENTlink written 23 months ago by Pierre Lindenbaum127k

Hello Pierre,

"SUM-CLIPPING" is the sum of clipped bases? If so, how can I modify the code that I get the sum of clipped reads in addition?

fin swimmer

ADD REPLYlink written 23 months ago by finswimmer13k

that I get the sum of clipped reads in addition?

final long counts[]=new long[5];
(...)
if(cigar.isClipped()) counts[4]++;
(...)
println("NUM-READS-CLIPPED:"+counts[4]);
ADD REPLYlink written 23 months ago by Pierre Lindenbaum127k
1

Works very well.

Thanks a lot.

ADD REPLYlink written 23 months ago by finswimmer13k
3
gravatar for d-cameron
23 months ago by
d-cameron2.1k
Australia
d-cameron2.1k wrote:

The GRIDSS package contains a picard-like tool gridss.CollectCigarMetrics which will calculate and summarise the the frequency of each CIGAR element length and operation. Your fraction can then be trivially calculated using read count, and number of reads with 0 soft clip CIGAR elements.

ADD COMMENTlink modified 23 months ago by h.mon29k • written 23 months ago by d-cameron2.1k

Nice package. Is there a way to run just CollectCigarMetrics instead of the whole CallVariants pipeline?

fin swimmer

ADD REPLYlink written 23 months ago by finswimmer13k
1

I've designed it so every step can be run as a stand-alone program. Just run java -cp gridss.jar gridss.analysis.CollectCigarMetrics. example_pipeline.sh has an example of using CollectGridssMetrics but CollectCigarMetrics works the same way.

ADD REPLYlink modified 23 months ago • written 23 months ago by d-cameron2.1k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2010 users visited in the last hour