Count small RNA reads by length and gene ids
1
0
Entering edit mode
6.1 years ago
SpreeFU • 0

Dear all,

I have SAM files after mapping small RNA reads to genome. I want to know how many reads of small RNAs mapped to individual genes and what are the lengths of mapped reads to each genes. If it's too complicated to get all different length of reads, can I know for individual genes how many 24nt sRNAs mapped?

Or first get the mapped 24 nt reads, then see how many of each gene has?

Thanks!!

RNA-Seq • 1.7k views
1
Entering edit mode
6.1 years ago
Medhat 9.5k

you can use bed file for the interested genes coordinate and then use intersect command from bedtools to count number of reads mapped to genes and you can specify the -wo option

-wo Write the amount of overlap between intersecting features

bedtools intersect -a your_genes.bed -b your_align.bam -wo > results.bed


or use htseq-count

"htseq-count [options] <sam_file> <gff_file>"


or bedtools multicov which will give you number of aligned reads for each feature

bedtools multicov –bams your_align.bam -bed your_genes.bed

0
Entering edit mode

Hi Medhat,

Thanks for your solution! When I use the commands below, it was killed.

bedtools intersect -a GENE_body_sorted.bed -b Col-genome-rep1.bam -wo > Col-genome-rep1.bed
Killed


Here is my bed file for genes:

chr1    3631    5899    AT1G01010   .   +
chr1    5928    8737    AT1G01020   .   -
chr1    11649   13714   AT1G01030   .   -
chr1    23146   31227   AT1G01040   .   +
chr1    31170   33153   AT1G01050   .   -
chr1    33379   37871   AT1G01060   .   -
chr1    38752   40944   AT1G01070   .   -
chr1    44677   44787   AT1G01073   .   +
chr1    45296   47019   AT1G01080   .   -
chr1    47485   49286   AT1G01090   .   -


I used htseq-count to count numbers of mapped reads of each gene. However, I do not know which option I can use to get the lengths of each mapped reads. For example, there are a group of small RNAs mapped to gene AT1G01010. Among those sRNAs, I want to know how many of them are 21 nt sRNAs, 22 nt sRNAs, 23 nt sRNAs ... etc.

0
Entering edit mode

htseq-count I do not know option that will give you length.

Killed it means that there is no enough memory; how much memory you have?

a work around could be find here https://github.com/arq5x/bedtools2/issues/232

0
Entering edit mode

It's only 8 GB RAM, so seems it can not be done....

0
Entering edit mode

did you tried this work around in the link ?

0
Entering edit mode

I tried bedtools intersect -a foo.sorted.bed -b foo.sorted.bed -sorted -wa -wb

Here is error again

bedtools intersect -a Col-genome-rep1.sorted.bam -b GENE_body_sorted.bed -sorted -wa -wb
Error: Invalid record in file Col-genome-rep1.sorted.bam. Record is
D00118:271:CA0YNANXX:7:1101:10000:11035 16  3   3366347 255 22M *0  0   AGTGCTTTGTCTACATTTGGGA  GGGGGGGGGGGGGGGGGCCCCC  XA:i:1  MD:Z:1T20   NM:i:1


Bedtools Intersect Error in this link, says 'The end site is before the start site.' seems getting more complicated, I'm not sure if this is what I want now...