Extract coordinates of adjacent genes for each provided intergenic regions from gff file
2
2
Entering edit mode
8.6 years ago
Sumit ▴ 20

I have gff files and files containing intergenic regions for different bacterial genomes. I want to extract the adjacent genes and their co-ordinate which corresponding those intergenic regions for all of these genomes. Are there any scripts or tools are available?

Example:

For Bacillus anthracis, I have two files:

  1. gff file
  2. files containing coordinates of intergenic regions:

intergenic_region.txt

185457-185562
320958-321064
1146951-1147049
1285399-1285500
3894344-3894451
4075706-4075815

I want to extract the coordinate of adjacent genes for each of those intergenic regions from gff files.

I have to do such kind of process for 300+ bacterial genomes.

gene genome sequence • 4.7k views
ADD COMMENT
0
Entering edit mode

It can be done but you'll have to do some pre-processing of the data first. Convert both your GFF and text file into BED format, then use bedtools closest to find the nearest gene and it's coordinate.

ADD REPLY
2
Entering edit mode
8.6 years ago

Convert the GFF file to BED with BEDOPS gff2bed:

$ gff2bed < annotations.gff > annotations.bed

Convert the intergenic regions to BED. Assuming they are all on chromosome chr1 and 0-indexed:

$ awk 'BEGIN { FS = "-"; } { print "chr1\t"$1"\t"$2; }' intergenic_region.txt > intergenic_region.bed

You may need to modify the name used for the chromosome, depending on the naming scheme used in the annotations file. You could browse through the first column of annotations.bed to see what the chromosome naming scheme looks like.

Once you have the annotations and intergenic regions in BED format, you can use BEDOPS closest-features to find the nearest annotations to the regions:

$ closest-features intergenic_region.bed annotations.bed > answer.bed
ADD COMMENT
0
Entering edit mode

Thank u Alex, it works....

ADD REPLY
1
Entering edit mode
8.6 years ago
Alternative ▴ 270

You have to be way more specific in your query.

  1. Your intergenic regions file does not have the Chromosome information! This is essential. Add them
  2. As James suggested, you need to have it in bed file
  3. Use bedtools closest to get the closest gene. You need both files to be in bed format but you can use input from STDIN after you select the necessary columns from your GTF file

When you have prerequisites 1 and 2 above, send an example of both files (a couple of lines from each) and we can show you how to do it.

ADD COMMENT
0
Entering edit mode

Thank you for your suggestions... bedtools closest is working for my problem...

ADD REPLY

Login before adding your answer.

Traffic: 2512 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6