Intergenic Region
1
1
Entering edit mode
12.5 years ago
Ss ▴ 50

Dear All,

I was looking for some tool or script that can hep me to find the intergenic regions between the genes.

I have sequences and coordinates for the gene cluster.

Any help or suggestions are welcome.

Thanks!

• 8.0k views
ADD COMMENT
0
Entering edit mode

I assume your are not able to program, because this is a very easy problem. But if you want us to program it, we should have at least the structure of the layout of the coordinates and the format of the file containing the sequence.

ADD REPLY
0
Entering edit mode

Fabian,

I am also looking for a code logic to extract intergenic sequences based on the coordinates of the genes. But am stuck with overlapping complications. Could you please share code logic to address the case given below.

Gene Coordinates and Gene Details - Name and Strand

Start - Stop GeneName Strand 10 - 19 Gene_1 + 27 - 46 Gene_2 + 27 - 89 Gene_3 - 110 - 250 Gene_4 + 120 - 340 Gene_5 + 180 - 350 Gene_6 - 260 - 397 Gene_7 - 425 - 625 Gene_8 + 680 - 2 Gene_9 -

Ideally this is the output I am expecting

IGNo Start - End DistalGeneName ProximalGeneName DistalGeneStrand ProximalGeneStrand IG1 3 - 9 Gene_9 - Gene_1 + (Comparison with the last start and stop positions to get the actual IG coordinates) IG2 20 - 26 Gene_1 + Gene_3 - (In case of genes with same start coordinates the longer gene would be the proximal gene) IG3 90 - 109 Gene_3 - Gene_4 + IG4 398 - 424 Gene_7 - Gene_8 + (Here is the difficulty, how to skip the intermediate overlapping genes) IG5 626 - 679 Gene_8 + Gene_9 -

The overlaps in some case can be many, having difficulty to address that in logic.

If you can share a code that can resolve this or explain the logic that I can use, it would be awesome and I would be very thankful to you.

ADD REPLY
0
Entering edit mode

I assume your are not able to program, because this is a very easy problem. But if you want us to program it, we should have know at least the structure of the layout of the coordinates and the format of the file containing the sequence.

ADD REPLY
3
Entering edit mode
12.5 years ago

Use Bedtools subtractBed : http://code.google.com/p/bedtools/wiki/Usage#subtractBed

  • and a BED with the whole chromosomes
  • and another one containing your genes clusters.
ADD COMMENT
0
Entering edit mode

thanks for this. saves time to code, even though it seems trivial.

ADD REPLY
0
Entering edit mode

trivial? efficiently intersecting intervals is one of the more difficult problems in bioinformatics - I suspect that you may be misjudging what actually needs to be done. bedtools to the rescue!

ADD REPLY
0
Entering edit mode

Yes, my bad. I read the command alone and thought that it, given the coordinates of genes AND exons, it gets me the intron coordinates. I have a gff annotation where this is the case (no intron coordinates). I guess I overlooked. This would require a data structure like interval trees (as in IRanges), I'd suppose..?

ADD REPLY
0
Entering edit mode

Yes, my bad. I read the command alone and thought that it, given the coordinates of genes AND exons, it gets me the intron coordinates. I have a gff annotation where this is the case. I guess I overlooked. This would require a data structure like interval trees to find overlapping regions (as in IRanges), I'd suppose..?

ADD REPLY

Login before adding your answer.

Traffic: 2073 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6