intersect two GFF3 files with exact overlapping co-ordinate
1
1
Entering edit mode
9.9 years ago
firoz.imtech ▴ 50

I want to find the overlapping coordinate from query.gff3 to annotWS240.gff3 using following command.

intersectBed -a query.gff3 -b annotWS240.gff3 -wb -wa >OUT_gff3

I want to extract exact coordinate position from annotWS240 which overlap with query.gff3. However, I am getting the some extra coordinate eg: sequence AAB48626_GHR-10017@H6 has a coordinate 1834890 1835393, but I am getting the features of coordinate 1834883 1835439 of Transcript C53H9.1.

(Please see the below file).

  1. How can I get the exact coordinate features such as 1834890 1835393 from C53H9.1?
  2. Is there any tools available to extract the fasta sequence using the current OUT_gff3 file? I want mature transcript (without introns).

Thanks for your valuable suggestions.

OUT_gff3 file

I    ePCR    AAB48626_GHR-10017@H6    1834890    1835393    .    +    .    I    WormBase    CDS    1835092    1835229    .    +    0    ID=CDS:C53H9.1;Parent=Transcript:C53H9.1
I    ePCR    AAB48626_GHR-10017@H6    1834890    1835393    .    +    .    I    WormBase    CDS    1835278    1835394    .    +    0    ID=CDS:C53H9.1;Parent=Transcript:C53H9.1
I    ePCR    AAB48626_GHR-10017@H6    1834890    1835393    .    +    .    I    WormBase    intron    1835045    1835091    .    +    .    Parent=Transcript:C53H9.1;Note=Confirmed_EST yk491b8.5 %3B Confirmed_cDNA U89308 %3B Confirmed_EST OSTF036F4_1 %3B Confirmed_EST OSTF036F4_1 %3B 
I    ePCR    AAB48626_GHR-10017@H6    1834890    1835393    .    +    .    I    WormBase    exon    1835092    1835229    .    +    .    Parent=Transcript:C53H9.1
I    ePCR    AAB48626_GHR-10017@H6    1834890    1835393    .    +    .    I    WormBase    intron    1835230    1835277    .    +    .    Parent=Transcript:C53H9.1;Note=Confirmed_EST yk491b8.5 %3B Confirmed_cDNA U89308 %3B Confirmed_EST OSTF036F4_1 %3B Confirmed_EST OSTF036F4_1 %3B 
I    ePCR    AAB48626_GHR-10017@H6    1834890    1835393    .    +    .    I    WormBase    exon    1835278    1835439    .    +    .    Parent=Transcript:C53H9.1
I    ePCR    AAB48626_GHR-10017@H6    1834890    1835393    .    +    .    I    WormBase    gene    1834883    1835439    .    +    .    ID=Gene:WBGene00004441;Name=WBGene00004441;locus=rpl-27;sequence_name=C53H9.1;biotype=protein_coding
I    ePCR    AAB48626_GHR-10017@H6    1834890    1835393    .    +    .    I    WormBase    mRNA    1834883    1835439    .    +    .    ID=Transcript:C53H9.1;Parent=Gene:WBGene00004441;Name=C53H9.1;wormpep=WP:CE19381;locus=rpl-27
I    ePCR    AAB48626_GHR-10017@H6    1834890    1835393    .    +    .    I    WormBase    exon    1834883    1835044    .    +    .    Parent=Transcript:C53H9.1
I    ePCR    AAB48626_GHR-10017@H6    1834890    1835393    .    +    .    I    WormBase    CDS    1834889    1835044    .    +    0    ID=CDS:C53H9.1;Parent=Transcript:C53H9.1;Name=C53H9.1;wormpep=WP:CE19381;locus=rpl-27
I    ePCR    AAB49759_GHR-11053@F05    8408162    8410802    .    +    .    I    WormBase    gene    8407802    8410804    .    -    .    ID=Gene:WBGene00003014;Name=WBGene00003014;locus=lin-28;sequence_name=F02E9.2;biotype=protein_coding
I    ePCR    AAB49759_GHR-11053@F05    8408162    8410802    .    +    .    I    WormBase    mRNA    8407802    8409976    .    -    .    ID=Transcript:F02E9.2b;Parent=Gene:WBGene00003014;Name=F02E9.2b;wormpep=WP:CE24880;locus=lin-28
I    ePCR    AAB49759_GHR-11053@F05    8408162    8410802    .    +    .    I    WormBase    mRNA    8407802    8410804    .    -    .    ID=Transcript:F02E9.2a;Parent=Gene:WBGene00003014;Name=F02E9.2a;wormpep=WP:CE24879;locus=lin-28
I    ePCR    AAB49759_GHR-11053@F05    8408162    8410802    .    +    .    I    WormBase    exon    8407802    8408496    .    -    .    Parent=Transcript:F02E9.2b
I    ePCR    AAB49759_GHR-11053@F05    8408162    8410802    .    +    .    I    WormBase    exon    8407802    8408496    .    -    .    Parent=Transcript:F02E9.2a
I    ePCR    AAB49759_GHR-11053@F05    8408162    8410802    .    +    .    I    WormBase    CDS    8408161    8408496    .    -    0    ID=CDS:F02E9.2a;Parent=Transcript:F02E9.2a;Name=F02E9.2a;wormpep=WP:CE24879;locus=lin-28
I    ePCR    AAB49759_GHR-11053@F05    8408162    8410802    .    +    .    I    WormBase    CDS    8409134    8409340    .    -    0    ID=CDS:F02E9.2a;Parent=Transcript:F02E9.2a
I    ePCR    AAB49759_GHR-11053@F05    8408162    8410802    .    +    .    I    WormBase    CDS    8410663    8410803    .    -    0    ID=CDS:F02E9.2a;Parent=Transcript:F02E9.2a
I    ePCR    AAB49759_GHR-11053@F05    8408162    8410802    .    +    .    I    WormBase    CDS    8408161    8408496    .    -    0    ID=CDS:F02E9.2b;Parent=Transcript:F02E9.2b;Name=F02E9.2b;wormpep=WP:CE24880;locus=lin-28
I    ePCR    AAB49759_GHR-11053@F05    8408162    8410802    .    +    .    I    WormBase    CDS    8409134    8409340    .    -    0    ID=CDS:F02E9.2b;Parent=Transcript:F02E9.2b
I    ePCR    AAB49759_GHR-11053@F05    8408162    8410802    .    +    .    I    WormBase    CDS    8409684    8409731    .    -    0    ID=CDS:F02E9.2b;Parent=Transcript:F02E9.2b
I    ePCR    AAB49759_GHR-11053@F05    8408162    8410802    .    +    .    I    WormBase    intron    8408497    8409133    .    -    .    Parent=Transcript:F02E9.2a;Note=Confirmed_EST yk1158b04.5 %3B Confirmed_cDNA U75912 %3B Confirmed_EST OSTR155H1_1 %3B Confirmed_EST OSTR155H1_1 %3B 
I    ePCR    AAB49759_GHR-11053@F05    8408162    8410802    .    +    .    I    WormBase    intron    8408497    8409133    .    -    .    Parent=Transcript:F02E9.2b;Note=Confirmed_EST yk1158b04.5 %3B Confirmed_cDNA U75912 %3B Confirmed_EST OSTR155H1_1 %3B Confirmed_EST OSTR155H1_1 %3B 
I    ePCR    AAB49759_GHR-11053@F05    8408162    8410802    .    +    .    I    WormBase    exon    8409134    8409340    .    -    .    Parent=Transcript:F02E9.2a
I    ePCR    AAB49759_GHR-11053@F05    8408162    8410802    .    +    .    I    WormBase    exon    8409134    8409340    .    -    .    Parent=Transcript:F02E9.2b
I    ePCR    AAB49759_GHR-11053@F05    8408162    8410802    .    +    .    I    WormBase    intron    8409341    8409683    .    -    .    Parent=Transcript:F02E9.2b;Note=Confirmed_EST yk117g6.5 %3B Confirmed_EST OSTR073F8_1 %3B 
I    ePCR    AAB49759_GHR-11053@F05    8408162    8410802    .    +    .    I    WormBase    intron    8409341    8410662    .    -    .    Parent=Transcript:F02E9.2a;Note=Confirmed_EST yk1030a12.5 %3B Confirmed_cDNA U75912 %3B Confirmed_EST OSTF155H1_1 %3B Confirmed_EST OSTF155H1_1 %3B 
I    ePCR    AAB49759_GHR-11053@F05    8408162    8410802    .    +    .    I    WormBase    exon    8409684    8409976    .    -    .    Parent=Transcript:F02E9.2b
I    ePCR    AAB49759_GHR-11053@F05    8408162    8410802    .    +    .    I    WormBase    five_prime_UTR    8409732    8409976    .    -    .    Parent=Transcript:F02E9.2b
I    ePCR    AAB49759_GHR-11053@F05    8408162    8410802    .    +    .    I    WormBase    exon    8410663    8410804    .    -    .    Parent=Transcript:F02E9.2a
ChIP-Seq next-gen RNA-Seq • 3.4k views
ADD COMMENT
0
Entering edit mode

What do you meant by "exact overlapping"? That a feature in B has to fit entirely within A?

ADD REPLY
0
Entering edit mode

Yes, I want same coordinate boundary for both A and B. Basically, I want to extract the sequence of mature mRNA that fall between the coordinate of A GFF3.

ADD REPLY
2
Entering edit mode

You can set the minimum overlap proportion to one with -f 1.0

ADD REPLY
1
Entering edit mode
9.9 years ago

Use overlapping threshold as 1.0 (100% overlap) as suggested by David which means feature A is completely in B and then you can use getFasta utility of bedTools to retreive the fasta for the input coordinates from the file (works with bed/gff/vcf).

Subset the file on co-ordinates of feature (which might be column 11-12 and doesn't includes introns) and then use the above utility.

ADD COMMENT
0
Entering edit mode

Thanks for suggestion.

ADD REPLY

Login before adding your answer.

Traffic: 2611 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6