Tools to check the length of isoforms in reference transcript
0
0
Entering edit mode
21 months ago
shinyjj ▴ 50

Hi biostars,

I want to generate a histogram of reference transcript in here (https://www.ncbi.nlm.nih.gov/projects/genome/guide/human/index.shtml#:~:text=gff3-,RefSeq%20Transcripts,-Fasta).

Can anyone suggest a tool that can generate a histogram of the length of the isoform in this file? Ideally, the x-axis would the distribution of the isoform length and the y-axis would be the number of isoforms counted.

transcript histogram isoform • 1.5k views
ADD COMMENT
2
Entering edit mode

Use bioawk and then pass the output of bioawk to a simple hist() on R.

ADD REPLY
0
Entering edit mode

Thank you! I am unfamiliar with bioawk. Do you know what kind of command line I should use to generate the output? What kind of output is it when I use bioawk?

ADD REPLY
1
Entering edit mode

Are you familiar with awk? Bioawk is awk customized to work with common bioinformatics formats. For example, (if memory serves me right) the preset "fastx" uses @ and > as record separators instead of the usual new line. You can use awk's functions/variables to get what you want once you understand the underlying concepts.

See the manual: https://github.com/lh3/bioawk

Experiment with it - generate a 2 column output with transcript name and transcript length (although you'd only need the second column for the histogram). In R, run ?hist to understand how to plot a histogram - it is trivial, it simply needs a vector of numbers.

ADD REPLY
1
Entering edit mode

Maybe the solution suggested in How to generate sequence length distribution from Fasta file could work? Once you have the lengths, you could plot it in R, python, or your language of choice.

ADD REPLY
0
Entering edit mode

Thanks everyone! Now, I have a file that looks like this that has the transcript name on the left and its length on the right. It contains 177816 transcripts. What would be a good tool to plot this in R? enter image description here

ADD REPLY
1
Entering edit mode

Just read the file in R (read.table...) and plot it using hist(), as Ram suggested. Maybe good to try it a bit yourself first, see this. If you get into trouble, just feel free to come back and ask.

ADD REPLY
1
Entering edit mode

I got the result as I wanted. I am pretty new to R. Thanks Ram and iraun :)

ADD REPLY

Login before adding your answer.

Traffic: 2522 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6