Question

how to get read density for the first 1000 nt of each transcript

0

Entering edit mode

7.7 years ago

Sara ▴ 240

Hi,

I have 2 RNA-seq files for two samples and aligned them to the reference genome so I have BAM files. I would like to get the read density of the first 1000 nucleotides for all transcripts and then get the average of that in such a way I would get one value per sample (which is average read density for the first 1000 nt of all transcripts) . so far, in python I have got a dictionary containing one transcript per gene as a representative of gene (in this dictionary I have the gene name and transcript name). do you guys know how I can get the read density of the first 1000 nt for each transcript? the I can get the average of that.

Thanks

sequence next-gen RNA-Seq • 1.7k views

ADD COMMENT • link updated 5.8 years ago by Devon Ryan 104k • written 7.7 years ago by Sara ▴ 240

0

Entering edit mode

Why don't you create a bed file with first 1000bp of each transcript and get the coverage with bedtools or some other tool ? You can even get coverage at each base using genomecoverage function in bedtools. If you want it to be Python, there are many libraries in deeptools or HTseq packages.

ADD REPLY • link 7.7 years ago by GouthamAtla 12k

0

Entering edit mode

then I would get the read density from the end of all transcripts and average them. at the end I am interested in the ratio of the average from the end and beginning of each transcript

ADD REPLY • link 7.7 years ago by Sara ▴ 240

1

Entering edit mode

The original question does not mention anything about "End" or "ratios".

ADD REPLY • link 7.7 years ago by GouthamAtla 12k

score 0 · Answer 1 · 2018-06-17

About 22 months too late, but using deepTools:

Use bamCoverage to get a bigWig of the files.
Use computeMatrix reference-point -a 1000 on the bigWig files
Use plotHeatmap --outFileNameMatrix with the output from above.

The last step will produce a text file with average density per bin per sample. You can then average that appropriately. You can change the bin sizes throughout (e.g., make it 1000 in step 2), but since you're averaging the heck out of everything anyway it's unlikely to make much of a difference.