Tool to summarise stats on supplementary reads in bam
1
0
Entering edit mode
5 weeks ago
rob234king ▴ 610

Is there a tool that will give detailed information on mapped reads beyond basic numbers? I'm looking for breakdown of distances of supplementary read mappings of long reads versus primary. Like how many within 200 bp, 500bp, 1000bp 3000bp etc

Tool • 358 views
ADD COMMENT
3
Entering edit mode
5 weeks ago

using bioalcidaejdk http://lindenb.github.io/jvarkit/BioAlcidaeJdk.html and the SA:Z tag:

final Counter<String> contigs = new Counter<>();
final Counter<String> distances = new Counter<>();

stream().
    filter(R->!(R.isSecondaryOrSupplementary() || R.getReadUnmappedFlag())).
    forEach(R->{
    for(SAMRecord R2: htsjdk.samtools.SAMUtils.getOtherCanonicalAlignments(R)) {
        if(R2.contigsMatch(R)) {
            int distance = Math.abs((int)(R2.getStart() - R.getStart())/1000)*1000;
            distances.incr(String.valueOf(distance));
            }
        else
            {
            contigs.incr(R.getContig()+"->"+R2.getContig());
            }
        }
    });
println("#contigs");
for(String c:contigs.keySet()) {
    println(c+"\t"+contigs.count(c));
}
println("#distances");
for(String d:distances.keySet()) {
    println(d+"\t"+distances.count(d));
}

invoke:

$  java -jar dist/bioalcidaejdk.jar -f input.code src/test/resources/retrocopy01.bwa.bam

#contigs
chr4->chr5  1
chr4->chr6  1
chr4->chr13 4
chr4->chr10 1
chr4->chr7  1
chr4->chrX  2
chr4->chr11 1
chr4->chrY  1
chr4->chr20 1
chr4->chr18 1
chr4->chr1  1
chr4->chr2  1

#distances
0   22
6000    13
5000    11
12000   13
1000    25
10000   8
2000    33
ADD COMMENT
1
Entering edit mode

Thanks. it works, I feel like I should be able to unpick and work it via awk but why fight something that does the job

ADD REPLY
0
Entering edit mode

Just checking in the distances output 0 = reads from 0-1000 or just 0 1000 = reads from 1000-2000 or 1-1000?

ADD REPLY
1
Entering edit mode

0 = reads from 0-1000

ADD REPLY
0
Entering edit mode

thanks, how easy is it using the same tool to count how many alignments a read has, so from 1-10 alignments and how many alignments reads have? e.g. 1 alignment only - 1000 reads, 2 alignments - 2000 reads. I think that is different to what I have calculated here but applicable to understanding the context of those numbers.

ADD REPLY

Login before adding your answer.

Traffic: 1460 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6