Find site feature according to GFF3 file
1
1
Entering edit mode
9.6 years ago
biolab ★ 1.4k

Dear all,

I have a list of site information exemplified as below. Also I have a GFF3 file. Now I need to find where those sites locate, for example, my ideal output is shown at bottom. Could anyone help to give me suggestions or solutions? I much appreciated your helps. THANKS A LOT!

Site file:

Site No.   Chr  Start   End
1          1    3000    3100
2          1    3200    3280

GFF3 file:

Chr1 MSU_osa1r7 gene 2903 10817 . + . ID=LOC_Os01g01010;Name=LOC_Os01g01010
Chr1 MSU_osa1r7 mRNA 2903 10817 . + . ID=LOC_Os01g01010.1;Name=LOC_Os01g01010.1
Chr1 MSU_osa1r7 exon 2903 3268 . + . ID=LOC_Os01g01010.1:exon_1
Chr1 MSU_osa1r7 intron 3269 3300 . + . ID=LOC_Os01g01010.1:intron_1

Ideal output:

Site No.  Chr  Start   End    Pos
1         1    3000    3100   ID=LOC_Os01g01010.1:exon_1
2         1    3200    3280   ID=LOC_Os01g01010.1:exon_1;ID=LOC_Os01g01010.1:intron_1
GFF3 • 1.9k views
ADD COMMENT
0
Entering edit mode

Does your site file contain any chromosome information?

ADD REPLY
0
Entering edit mode

Hi Pgibas, yes, I forget to add it.

ADD REPLY
0
Entering edit mode

A bit of coding with R and the GenomicRanges packages should make this doable.

ADD REPLY
4
Entering edit mode
9.6 years ago
PoGibas 5.1k

Modify your Site and GFF3 files using awk (change order of columns - make bed files).

Use bedtools intersect to get wanted output.

ADD COMMENT
0
Entering edit mode

Thanks a lot, Pgibas and Devon, I need a bit further guidance. Regarding to the bedtools method, I follow the below steps, unfortunately got an error : Error: unable to open file or unable to determine types for file f1.bed

Step1 - Prepare site file and gff3 file, both of which should be in the same format.

Site file: f1.bed

Chr1    3000    3100   +
Chr1    3200    3280   +

GFF3 file: f2.bed

Chr1 ID=LOC_Os01g01010;Name=LOC_Os01g01010  2903 10817  +
Chr1 ID=LOC_Os01g01010.1;Name=LOC_Os01g01010.1  2903 10817  +
Chr1 ID=LOC_Os01g01010.1:exon_1  2903 3268  +
Chr1 ID=LOC_Os01g01010.1:intron_1  3269 3300  +

Step2 -- Use bedtools intersect to find their overlapped region.

I use this command: bedtools intersect -a f1.bed -b f2.bed

ADD REPLY
0
Entering edit mode

f2.bed isn't a bed file (take a look at this explanation). Fields in the bed file should be: chrom start end.

Change your f2 file into something like this:

Chr1  2903 10817 + ID=LOC_Os01g01010;Name=LOC_Os01g01010
ADD REPLY
0
Entering edit mode

Hi Pgibas, Thank you very much.

ADD REPLY

Login before adding your answer.

Traffic: 2832 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6