RNA Seq length distribution plot
1
0
Entering edit mode
7.3 years ago
HK ▴ 40

Hey,

I am making plots for my paper, and really want to make such a plot. I want the fastq files, bam files and also the xpression (counts) file for all samples. I cant figure out how to make such a plot to show the different species that I found in my samples, show the frequency peaks as shown here. This is 3D plot, maybe R can be used. But what is the input? how should I extract such data from the files I have.

enter image description here

The caption of the figure:

Length distribution and annotation of small RNAs in human serum. A: length distribution of reads obtained by deep sequencing of small RNAs extracted from human serum. Shown here are only those reads that map to the hg19 (GRCh37) human genome, with length distribution plotted against abundance of sequencing reads. Sequencing reads combined from 5 different human serum samples were mapped to the human genome with Bowtie according to the end-to-end k-difference policy with 2 mismatches, either allowing (blue bars) or disallowing (red bars) multiple reportable alignments for each read. B: sequencing reads from the 5 individual human serum samples shown as a pool in A. Bars with different colors denote the source of the sequenced human serum small RNA from the 5 individual samples. Distributions in the 5 individual samples are similar. C: length distribution of annotated reads obtained by deep sequencing of small RNAs extracted from the 5 human serum samples. Length distribution is plotted against abundance of the reads annotated as miRNAs, YRNAs, tRNAs, rRNAs, or other sRNAs (snRNAs and snoRNAs). D: a pie chart showing the percentage of reads from the 5 pooled samples mapping to the indicated specific types of small RNAs. E: frequencies of YRNA types represented in the aligned reads. YRNAs are classified in Ensembl as RNY1, RNY3, RNY4, RNY5, pseudogenes originating from the 4 human YRNAs (RNY1P, RNY3P, RNY4P, RNY5P), and a group of predicted YRNAs from the Rfam database (http://rfam.sanger.ac.uk)

distribution-plot RNA-seq • 3.9k views
ADD COMMENT
2
Entering edit mode

Never use 3D pie charts plots please.. Use classic bar plots : http://genomicsclass.github.io/book/pages/plots_to_avoid.html

ADD REPLY
2
Entering edit mode
7.3 years ago

To get the distribution of the read length from the reads mapping on tRNAs (untested) :

samtools view -L tRNA.bed my_bam.bam | \  # extract the reads that map on tRNAs
awk '{print length($10)}' | \  # get their length
sort -n | uniq -c              # get the length distribution

From this, a simple histogram can be more effective than complicated 3D plots.

PS : tRNAs.bed is a bed file with the tRNAs genomic coordinates.

ADD COMMENT
0
Entering edit mode

the sort in the above answer sorts the entire row, not the length values?

I finally ended up with this: cat < input.file.SAM | cut -f10.10 | awk '{print length}' | sort | uniq -c

not sure it could be done easier, but this works.

ADD REPLY
0
Entering edit mode

No, the sort applies to the length of column 10 that I extracted with the awk command before. Your command should work yes, but you can simplify it by removing the useless cat and the cut (you can specify the column in awk as I did in my example) :

awk '{print length($10)}' input.file.SAM | sort -n | uniq -c

Clean and simple :)

ADD REPLY

Login before adding your answer.

Traffic: 1686 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6