Hi folks,
I am working with BAMs containing aligned reads for individual samples (RNAseq data). Some unforeseen issues with the data mean that I need to build BAI indices for compatibility with the software DegNorm. However, I am dealing with an organism that has large chromosomes exceeding the 512Mb limit for the BAI index.
Accordingly, I am looking for a way to split the chromosomes into intervals (e.g., chr1:1-700,000,000 is split into chr1a:1-350,000,000; chr1b:1-350,000,000...) within each BAM file. Similarly, I need to split the chromosomes within the corresponding reference genome's GTF file. Unfortunately, most of the questions online deal with splitting BAMs into separate files by chromosome name, or else renaming chromosomes within BAMs or GTFs. I would appreciate any help.
I suspect the easiest answer is to split the reference genome and then realign the samples, but I am hopeful there is another solution that does not involve re-doing earlier data processing.
Cheers, M
an alternative approach might be to try to make DegNorm work with "CSI" BAM indexes instead of "BAI". CSI indexes support data exceeding 512Mb. it might be as easy as adjusting this line of code, which just checks for bai, you could remove the check completely for example so that it does not try to create a bai index https://github.com/jipingw/DegNorm/blob/2e3e9c950849869289b8651f8c735aba4774cff5/R/coverage_cal_functions.R#L32-L36 then you can index you BAM files with "samtools index -c" outside of the R script
This is a good idea and one I'll try tomorrow. It's not clear to me where the index is actually used in this function, so it may not even be necessary to make CSI indices at all after commenting out the code you've highlighted.
Edit: An index is definitely required. Will update if a CSI index works with DegNorm.
Edit2: DegNorm eventually calls on the function ScanBamParam from Rsamtools. This function does an additional check of whether end coordinates abide a maximum value of 536870912. This results in an error given my large chromosomes, and I cannot seem to open the code to remove this check. Unless anyone has some more insight into this, I am a bit uneasy about further modifying DegNorm's code and the code of other packages it depends on. But, cheers cmdcolin for the idea!