HTSJDK help, how to randomaccess a position and count read counts in a bam file?
1
0
Entering edit mode
7.6 years ago
ylaguka • 0

I'm new to htsjdk and bam file so I would really appreciate your help to get started on this. My first goal is simple. I want to use a .bai to random access a position in .bam given chromosome and position. Then I want to count the number of reads for ref allele, and the number of reads for alt allele that have MapQuality above certain value at that position.

I am using HTSJDK, but I found it difficult to find exactly what I need from its long documentation.

So far I've only been able to open the file, thanks to Pierre Lindenbaum's previous post:

SamReaderFactory srf=SamReaderFactory.make();
srf.validationStringency(ValidationStringency.LENIENT);
SAMFileReader samReader=new SAMFileReader(vsFile, new File(bai));

//I know SAMFileReader is deprecated, but I dont know what's the new equivalent

AbstractBAMFileIndex index=(AbstractBAMFileIndex) samReader.getIndex();
SAMRecordIterator iter=samReader.iterator();

Even that's not really what I want, since I need to be able to random access it.

So yea, didn't go very far, any help is much appreciated!!!

htsjdk java next-gen genome alignment • 3.0k views
ADD COMMENT
0
Entering edit mode
7.6 years ago

Open the SAMReader from the factory:

SamReader samReader= SamReaderFactory.makeDefault().open(new File("/my/bam.bam"));

to iterate, look at the javadoc: https://samtools.github.io/htsjdk/javadoc/htsjdk/htsjdk/samtools/SamReader.html

SAMRecordIterator SamReader.query(java.lang.String sequence,
                        int start,
                        int end,
                        boolean contained)



SAMRecordIterator SamReader.queryOverlapping(java.lang.String sequence,
                                   int start,
                                   int end)
ADD COMMENT
0
Entering edit mode

Thank you Mr. Lindenbaum for your response.

But in my understanding, iterator doesn't use a bai file to access exactly where the position is, instead it loops till it finds the position matching the interval. Is there a way to use a .bai file to random access a bam file location?

ADD REPLY
1
Entering edit mode

instead it loops till it finds the position matching the interval.

you're wrong.

"Iterate over records that overlap the given interval. Only valid to call this if hasIndex() == true. "

ADD REPLY
0
Entering edit mode

I see, but I'm still confused, how it finds the interval without the input of a bai file. Would you help me make sense of it?

ADD REPLY
0
Entering edit mode

I see! It does go find the index file if it's named by convention. Thanks!

ADD REPLY

Login before adding your answer.

Traffic: 3861 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6