Genomic statistics from gtf
0
0
Entering edit mode
4.3 years ago

Hello,

I'm new in bioinformatics, and we should make a project for the exam, it is about computing statistics from gtf files like:

• Total number of genes.
• Avg number of alternative transcripts per gene
• Avg number of introns/exons per gene
• Avg number of alternative transcripts per gene
• Avg length of CDS, 5' and 3' UTR

Could you send me some link, actually anything to learn abt these stuff, i would really appreciate it.

Thank you in advance,

Best regards Nora

gene gtf • 2.1k views
4
Entering edit mode

What have you tried?

1
Entering edit mode

I can hardly imagine you've been given this assignment without any background, no?

0
Entering edit mode

Why not "Calculate genomic statistics from gtf" as title? In fact, if you search for genomic statistics from gtf or genomic statistics from gtf site:biostars.org you will find plenty of answers.

0
Entering edit mode

oooh, i didnt expect answers this fast. thx a lot. yes we did a course, abt fasta, x2, hash variables scalars, rand, ect but nth abt gtf, so i searched always the same things are written, no clue, what i should do and where i should start. so i downloaded the annotation file that i should do the project from gencode. and its not sorted, i tried to sort it, but everything i find is in gff or gff3, idk that is accaptable if i do the project in gff3 and then convert it to gtf. so i need some materials to study, i dont want to ask proff, coz i dont want to affect in my grade. yes i searched abt statistitics, til now nth. if u have book to suggest, pdf, link, anything that i can start from. that would be awsome. thx a lot again for ur response.

5
Entering edit mode

Please use the ADD COMMENT button when addressing comments / answers. And please, you are dealing with many non-native English speakers (like myself) who have a hard time understanding abbreviations and slang, so write in a more formal manner.

1
Entering edit mode

Let's start with your question, which is about how to work with GTF data.

Instead of trying to work with a full dataset from Gencode, I'd suggest stepping back and starting by instead first reading a little about the GTF format, such as in this link:

http://mblab.wustl.edu/GTF22.html

That link includes some example snippets, very short snippets, with explanations about attributes of GTF files, including features.

Features are what make each line of a GTF file important.

Once you read about features, it can then become a little easier to think about how to do counting exercises, such as counting genes, average transcripts per gene, etc.

When you're at that point, read the links found by searching the keywords in h.mon's comment. There you will find links to tools that help with reading in GTF-formatted files and doing those counting exercises on the data within.

0
Entering edit mode

You might also check out the gff3 format, which is an important extension to the gff2/gtf format. For both formats there are parsers in many languages, BioPerl for example, of course Python has an extensions as well, for example gffutils

0
Entering edit mode

My apology for disrupting the flow, this was supposed to become a comment based on Alex' earlier comment rather than a separate answer.

0
Entering edit mode

oooh sorry, i didnt realize that. (im non native too, my bad)

thanks for the link, i have already studied it. i read perlmonk, perldoc, every tutorial links that i found but nothing so far.

i need something to start from, to work, to get familiar with then can do the project.

when you all started to work with gtf, where did you start from, study from?

my apology again, its my bad habit but fast one. :p ;) thank you so much for your response. you are so available. :)

edit: i couldnt add comment, it gave me some errors, so i submit the answer and it worked.

0
Entering edit mode

You should be able to edit your comment and make changes there. Please see posts under http://biostars.org/t/how-to for step by step guidelines.

Also, h.mon requested that your comments be formal - that means professional as well. I'd recommend avoiding emojis such as :p and ;), as they're not strictly professional and it's better off not giving a playful vibe on online scientific forums.