Question: Identify reads overlapping a junction in gff file
gravatar for jkhurana2311
4.3 years ago by
United States
jkhurana23110 wrote:

Dear all,

I have a gff annotation file with coordinates for an overlapping genomic feature, e.g.
chr1 1-100
chr1 100-250
chr1 250-300
chr1 300-500

and an alignment file (bam/bed) with reads mapping to that chromosome. I need to extract reads that are specifically overlapping the junctions of that genomic feature. So, in a bed file like this,

chr1 50-70
chr1 90-110
chr1 150-170
chr1 245-255

Only extract, 
chr1 90-110
chr1 245-255

Any existing tools that can help me do that. I know bedtools intersect will give me all the common overlap but I am only looking for reads mapping the junctions of concurrent genomic features. I am not well versed in bioinformatics so I can't write my own scripts but am able to run existing tools. 

Any ideas would be appreciated.


junction gff overlap • 1.8k views
ADD COMMENTlink modified 4.3 years ago by Alex Reynolds27k • written 4.3 years ago by jkhurana23110

search this site/the web for bedtools intersect

ADD REPLYlink written 4.3 years ago by Pierre Lindenbaum118k

Thanks for the idea, Istvan Albert. That actually makes a lot of sense. I can do that.

thanks again


ADD REPLYlink written 4.3 years ago by jkhurana23110
gravatar for Alex Reynolds
4.3 years ago by
Alex Reynolds27k
Seattle, WA USA
Alex Reynolds27k wrote:

One way to do this is to make a pipeline that:

  1. Converts the annotations to BED via convert2bed -i gff
  2. Uses awk to get one base elements, if the previous stop and current start position values are equal
  3. Uses BEDOPS bedops --range to expand (for example) a 10-base window around the resulting one-base elements (adjusting this window to whatever you need)
  4. Maps the ranged elements to find overlapping reads via bedmap and convert2bed -i bam
$ convert2bed -i gff < annotations.gff \
    | awk ' \
        BEGIN { \
            previous_chr = "chr0"; \
            previous_stop = -1; \
        } \
        { \
            current_chr = $1; \
            current_start = $2; \
            current_stop = $3; \
            if ((current_start == previous_stop) && (current_chr == previous_chr)) { \
                print current_chr"\t"current_start"\t"(current_start + 1); \
            } \
           previous_chr = current_chr; \
           previous_stop = current_stop; \
        } \
    }' \
    | bedops --everything --range 10 - \
    | bedmap --echo --echo-map --skip-unmapped - <(convert2bed -i bam < reads.bam) \
    > answer.bed
ADD COMMENTlink modified 4.2 years ago • written 4.3 years ago by Alex Reynolds27k
gravatar for Istvan Albert
4.3 years ago by
Istvan Albert ♦♦ 79k
University Park, USA
Istvan Albert ♦♦ 79k wrote:

Often the way to solve these problems lies in building one ore more new dataset that when used with bedtools intersect produces what you need.

For example here create  file that contains  a marker in the form of a 1bp long interval for its start of each feature. Then in another file create markers for the 1bp interval of the end of each feature. Now intersect the two files and you'll end up with an file with intervals that are common. Use this file to intersect with the reads.

ADD COMMENTlink written 4.3 years ago by Istvan Albert ♦♦ 79k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 995 users visited in the last hour