Question: How To Calculate Number Of Bases Found In Introns/Cdss/Intergenic Space
1
gravatar for Panos
8.6 years ago by
Panos1.7k
Geneva, Switzerland
Panos1.7k wrote:

I have a GenBank (or even gff) file and want to count the number of bases found in introns/CDSs/intergenic space.

Does anyone know if there's any script already out there? I'll start writing mine (most probably using BioPerl). I'm just being lazy and also don't want to re-invent the wheel :D

cds intron • 2.2k views
ADD COMMENTlink written 8.6 years ago by Panos1.7k
2
gravatar for Istvan Albert
8.5 years ago by
Istvan Albert ♦♦ 85k
University Park, USA
Istvan Albert ♦♦ 85k wrote:

One thing you could do is get (or generate) a BED or GFF file that lists each exon coordinate. Then use the mergeBed tool in bedtools to create longer intervals, finally all you need is to add up the lengths of the merged intervals.

ADD COMMENTlink modified 8.5 years ago • written 8.5 years ago by Istvan Albert ♦♦ 85k
1
gravatar for Rm
8.6 years ago by
Rm8.0k
Danville, PA
Rm8.0k wrote:

using a GFF or GTF file for example using "gencode.v7.annotation_goodContig.gtf" file

to print number of bases in gene regions:

cat  gencode.v7.annotation_goodContig.gtf | awk '/gene/ { len +=$5-$4} END {print len}'

Similarly you can extract information for other regions ....

ADD COMMENTlink written 8.6 years ago by Rm8.0k

My problem is that intron coordinates are only implied in my gff file (they're the regions inside a gene that are not CDSs). The same is also true for intergenic space. And last, there might be multiple splice variants per gene. I only want one of them...

ADD REPLYlink written 8.6 years ago by Panos1.7k

you have to make your requirements more explicit - for example which splice variant do you want? in all I would say that there is probably no tool that does exactly what you want.

ADD REPLYlink written 8.6 years ago by Istvan Albert ♦♦ 85k

Albert, any variant would be good because I only want to have a rough estimate of the portion of genome found in introns/CDSs/intergenic space. Anyway, I've started writing my own script... I just wanted to make sure that I'm not re-inventing the wheel. Thank you all guys for your time!

ADD REPLYlink written 8.5 years ago by Panos1.7k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2306 users visited in the last hour