Hi,
I'm looking for a nice way to calculate the percentage of reads on specifique chromosomes (MT und (1-23,X,Y) und unplaced scaffold).
I know I can get the read counts per chromosome with samtools idxstats, however I have a lot of bam files and I would like to automate the calculation. The problem is, I'm struggling with basic batch text maniplulation and would therefore appreciate any help you can give me (specific or general).
Edit: I forgot to mention, that I have multiple bam files, which are indexed and only consist of uniquely mapped reads.
Do you have some familiarity with R or python?
Yes, I always work with R
So, can you update your question with where, specifically, you are getting stuck with processing many files with
samtools idxstats? Ideally, post any code you have tried.I will try to do it with R now. I just thought, there might be a short command in bash :)
You can certainly loop over your files in bash with something like (untested):
I suspect you will still want to "analyze" the data, so R will likely be involved at some point.