Good morning,
I have mapped transcriptome reads to the genome and have bam file now.
I know, how to calculate meandepth for "chromosomes", which are named in my gtf file as NODEs.
samtools coverage hisat2map.bam -o hisat2map_coverage.txt
It produces file with meandepths for chromosomes (NODEs).
#rname ... meandepth ...
NODE_1 ... 9 ...
NODE_2 ... 30 ...
But each NODE has a lot of CDSs, which is specified in gtf:
NODE_1 AUGUSTUS gene 200 300 . - . jg12345
NODE_1 AUGUSTUS transcript . . . - . transcript_id "jg12345.t1"; gene_id "jg12345"
NODE_1 AUGUSTUS stop_codon 200 202 . - 0 transcript_id "jg12345.t1"; gene_id "jg12345";
NODE_1 AUGUSTUS CDS 200 230 0.77 - 2 transcript_id "jg12345.t1"; gene_id "jg12345";
NODE_1 AUGUSTUS CDS 240 260 0.14 - 0 transcript_id "jg12345.t1";
So,
Question: How to calculate meandepths for CDS?
Input: bam and gtf
Output: file with meandepths for CDSs:
#name ... meandepth ...
NODE_1_CDS1 ... 10 ...
NODE_1_CDS2 ... 50 ...
What I tried: cut the CDS coordinates from gtf, make a bed file out of it like
NODE_1_CDS1 200 230
NODE_1_CDS2 240 260
and give as input samtools depth after the -b flag:
samtools depth hisat2map.bam -b annotation_onlyCDS.bed -o hisat2map_depth_onlyCDS.txt
Next, I was going to average the depths by CDSs. But this command gives an empty file. I think the reason is that in the gtf file (and, accordingly, in my bed) the numbering goes through the entire genome, and in the bam file it is separate for each NODE.
Could you help me, please?
Thank you very much in advance!
Best regards, Poecile