How to get the total genic and intergenic length of a chromosome?
2
0
Entering edit mode
6.0 years ago
Zee_S ▴ 60

Hello Biostars community!

Can you kindly suggest me a method to compute the total genic and intergenic length of a chromosome?

Thank you very much for your help!

genic intergenic length • 3.0k views
ADD COMMENT
0
Entering edit mode

Do you have annotations available for this genome?

ADD REPLY
0
Entering edit mode

Hello genomax,

Yes, I have the coordinates of annotated genes in a bed file. and I also have a sizes.genome file with the chromosomes sizes.

ADD REPLY
0
Entering edit mode

I assume you are asking this question because you are not familiar with a scripting language? If you are, you should be able to add up intervals for your genes using bed file start/stops and keep track of regions in between.

There may be a way to do this using bedops, awk and/or a combination of both. Can you post the output of 10-15 lines (head -15 your.bed) from your bed file to give people an idea of what you have?

ADD REPLY
0
Entering edit mode

I am posting here a 10-line snapshot of my genes.bed: thank you for your help!

gene    1666    2818
gene    4096    5114
gene    21496   28507
gene    40486   46470
gene    49036   54240
gene    73329   91655
gene    99448   122165
gene    133623  138575
gene    141258  149665
gene    151974  157575

I did the following but i am not sure its correct:

I merged overlapping genes with bedtools merge. then I created 100 bp bins of the merged intervals using bedtools window maker. then summed up the 100bp bins. I took this as the total genic length. is it correct? because when I sum up the differences of the merged intervals without binning, the value is different.

to get the intergenic length, I used bedtoools complement. and made 100bp bins of complement intervals and summed up these bins to get the "intergenic length". is this correct?

Is there a better way to do it?

Many thanks!

ADD REPLY
0
Entering edit mode

What does gene equate to, exons?

ADD REPLY
0
Entering edit mode

gene equates to TSS to TES. exon coordinates are within these intervals in a separate gtf file that I didn't show here.

ADD REPLY
0
Entering edit mode

does genic length mean excluding introns? because for my analyses I need to take intron coordinates into account. in that case, do I have to extract intron coordinates and use these in the intergenic calculation? thanks for your help

ADD REPLY
0
Entering edit mode

If you only want to include what gets translated then you would need to exclude introns/UTR's etc. Otherwise TSS to TES is the full gene sequence.

I need to take intron coordinates into account

What does that mean?

I am not sure what you are doing above in terms of binning things. Why not keep a single interval from TSS to TES as gene.

ADD REPLY
0
Entering edit mode

Edit: Yeah, never mind.

ADD REPLY
1
Entering edit mode
6.0 years ago

Since you have a GTF file, you can do the following in R:

library(GenomicRanges)
library(rtracklayer)
gtf = import("genes.gtf")  # or whatever it's called
foo = split(gtf, seqnames(gtf))
sapply(foo, function(x) sum(width(reduce(x))))

You now have a list printed to the screen of the number of bases in a gene in each chromosome. You can merge the last two lines together if you'd like.

ADD COMMENT
1
Entering edit mode
6.0 years ago

It looks like you have a .gtf file. That means you can extract the exon lines from the .gtf file and count and sum up the exonic intervals.

You can generate a sorted .bed file of exon coordinates by:

grep -P '\texon\t' your.gtf | cut -f 1,4,5 | sort -k1,1 -k2,2n > exons.bed

You can merge this exons.bed using bedtools:

bedtools merge -i exons.bed > exons.merged.bed

You can count/sum the intervals in the merged bed file:

awk -F'\t' 'BEGIN{SUM=0}{ SUM+=$3-$2 }END{print SUM}' exons.merged.bed

This should give you number of genic bases, if you are defining genic by just exons. To get intergenic, just sum up your chromosome lengths and subtract the genic number.

You can do all this in one line also.

grep -P '\texon\t' your.gtf | cut -f 1,4,5 | sort -k1,1 -k2,2n | bedtools merge -i stdin | awk -F'\t' 'BEGIN{SUM=0}{ SUM+=$3-$2 }END{print SUM}'
ADD COMMENT
0
Entering edit mode

Hello everyone,

You can't extract the exon lines directly from the .gtf file and work with it like if it was a bed file. Bed files are 0-based and gtf files are 1-based. First, we have to convert the gtf file in a bed file. We can do it with sortBed and gff2bed from Bedops:

# Convert from gff to bed
sortBed -i your.gtf | gff2bed > your.bed

And then extract the columns of interest from your new bed file.

You can check an example in my repository: https://github.com/tmontserrat/proportion_exon_regions

ADD REPLY

Login before adding your answer.

Traffic: 2269 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6