Question: Generate BED12 file from genomic coordinates
0
gravatar for Barry Digby
21 days ago by
Barry Digby500
National University of Ireland, Galway
Barry Digby500 wrote:

Hi all,

As the title suggests, is it possible to generate a bed12 file given a genomic range?

Here are some examples;

Standard bed file:

chr22   20564449    20566817    circrna 0   +

I would like to generate a bed12 file from the underlying sequence that would ultimately result in:

chr22   20564449    20566817    circrna 0   +   20564449    20564449    0,0,0   2   239,351 0,2017

Screenshot-from-2020-10-07-20-09-12

Red: genomic range, Blue: GTF, Black: desired bed12 output

I have also considered if the genomic range is intronic, in which case the sequence is represented as 1 exon block:

Screenshot-from-2020-10-07-20-13-53


The goal here is to use bedtools getfasta with the -split flag to retrieve the concatenated exon sequences / intron sequence.

Thanks,

Barry

ucsc bed12 bedtools • 132 views
ADD COMMENTlink modified 20 days ago • written 21 days ago by Barry Digby500

Since you answered your own question may want to add that as an answer instead of including it in the original post.

ADD REPLYlink written 20 days ago by genomax91k

I intend to, just need to test it at a larger scale tomorrow before doing that.

ADD REPLYlink written 20 days ago by Barry Digby500
1
gravatar for Barry Digby
20 days ago by
Barry Digby500
National University of Ireland, Galway
Barry Digby500 wrote:

Try the script at the following location: https://github.com/BarryDigby/circrna/blob/master/bin/get_mature_seq.sh

Must supply GTF file as first argument to script (gencode)

Must have unwanted_biotypes.txt in same dir.

Logic:

|-- Checks circRNA for overlapping features in GTF file (protein coding, pseudogenes, lncRNA)
|-- Does the circRNA overlap any features?
   |-- Yes:
   |       Attempts to fit circRNA to exon boundaries
   |-- Does it fit exon boundaries?
      |-- Yes:
      |       Make bed12 files
      |-- No:
      |    Attempts to fit to 'best transcript' (underlying transcript that spans the circRNA region with most exon blocks)
      |-- Does the circRNA fall within 200nt of underlying transcript boundary?
          |-- Yes:
          |        use best transcript exon boundaries, make bed12
          |-- No:
                   circRNA marked as EIciRNA, entire region treated as 1 block
   |-- No:
           no overlapping regions, circRNA treated as intronic

It perfectly recreates CircExplorer2 output, but that is to be expected as it is an annotation based circRNA discovery tool. De novo tool output might struggle using this script.

ADD COMMENTlink modified 9 days ago • written 20 days ago by Barry Digby500
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1831 users visited in the last hour