Question: How can I count read that falls into exon-intron and exon-exon junction?
0
gravatar for biplab
11 months ago by
biplab80
University of California, Davis
biplab80 wrote:

I have gtf file as follows

VI  ensembl gene    53260   54696   .   -   .   gene_id "YFL039C"; gene_name "ACT1"; gene_source "ensembl"; gene_biotype "protein_coding";
VI  ensembl transcript  53260   54696   .   -   .   gene_id "YFL039C"; transcript_id "YFL039C"; gene_name "ACT1"; gene_source "ensembl"; gene_biotype "protein_coding"; transcript_name "ACT1"; transcript_source "ensembl"; transcript_biotype "protein_coding";
VI  ensembl exon    54687   54696   .   -   .   gene_id "YFL039C"; transcript_id "YFL039C"; exon_number "1"; gene_name "ACT1"; gene_source "ensembl"; gene_biotype "protein_coding"; transcript_name "ACT1"; transcript_source "ensembl"; transcript_biotype "protein_coding"; exon_id "YFL039C.1";
VI  ensembl CDS 54687   54696   .   -   0   gene_id "YFL039C"; transcript_id "YFL039C"; exon_number "1"; gene_name "ACT1"; gene_source "ensembl"; gene_biotype "protein_coding"; transcript_name "ACT1"; transcript_source "ensembl"; transcript_biotype "protein_coding"; protein_id "YFL039C";
VI  ensembl start_codon 54694   54696   .   -   0   gene_id "YFL039C"; transcript_id "YFL039C"; exon_number "1"; gene_name "ACT1"; gene_source "ensembl"; gene_biotype "protein_coding"; transcript_name "ACT1"; transcript_source "ensembl"; transcript_biotype "protein_coding";
VI  ensembl exon    53260   54377   .   -   .   gene_id "YFL039C"; transcript_id "YFL039C"; exon_number "2"; gene_name "ACT1"; gene_source "ensembl"; gene_biotype "protein_coding"; transcript_name "ACT1"; transcript_source "ensembl"; transcript_biotype "protein_coding"; exon_id "YFL039C.2";
VI  ensembl CDS 53260   54377   .   -   1   gene_id "YFL039C"; transcript_id "YFL039C"; exon_number "2"; gene_name "ACT1"; gene_source "ensembl"; gene_biotype "protein_coding"; transcript_name "ACT1"; transcript_source "ensembl"; transcript_biotype "protein_coding"; protein_id "YFL039C";

How can I use this gtf file to make a bed file which I will be able to use for counting reads that falls into exon-exon and exon-intron junction?

sequencing rna-seq next-gen • 562 views
ADD COMMENTlink modified 11 months ago by Alex Reynolds28k • written 11 months ago by biplab80

Since it looks like your mapping to genomic sequences, how (and why?) would you count reads to fall into exon-exon? you will need some pretty long reads and/or very small introns to get into such a situation. Moreover any read that spans a exon-exon junction will be definition also span (2) exon-intron junctions.

I can understand the exon-intron part and then Alex Reynolds will work nicely

ADD REPLYlink written 11 months ago by lieven.sterck4.5k
3
gravatar for Alex Reynolds
11 months ago by
Alex Reynolds28k
Seattle, WA USA
Alex Reynolds28k wrote:

Steps via awk and BEDOPS.

Build a BED file of exons:

$ wget -qO- ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_28/gencode.v28.annotation.gtf.gz \
    | gunzip -c - \
    | awk '($3=="exon")' - \
    | gtf2bed - \
    | cut -f1-6 - \
    > gencode.v28.annotation.exons.c1t6.bed

Convert transcripts to merged exons:

$ awk -f transcripts2mergedExons.awk gencode.v28.annotation.exons.c1t6.bed > gencode.v28.annotation.mergedExons.c1t6.bed

Make an exon-intron list:

$ awk -f mergedExons2exonIntronList.awk gencode.v28.annotation.mergedExons.c1t6.bed > gencode.v28.annotation.exonsAndIntrons.c1t6.bed

Convert these to junctions:

$ awk -f exonIntronList2JunctionList.awk gencode.v28.annotation.exonsAndIntrons.c1t6.bed > gencode.v28.annotation.exonsIntronJunctions.c1t6.bed

Pad them, e.g. by 25 nt around the junction (adjust as needed):

$ bedops --everything --range 25 gencode.v28.annotation.exonsIntronJunctions.c1t6.bed > gencode.v28.annotation.exonsIntronJunctions.pad25.c1t6.bed

Map reads to the padded junctions:

$ bedmap --echo --count --delim '\t' gencode.v28.annotation.exonsIntronJunctions.pad25.c1t6.bed <(bam2bed < reads.bam) > answer.bed

The file answer.bed will contain the junction and the number of reads that map to — overlap with — the junction, by one or more bases.

Some links to Github Gists with listed awk scripts:

  • (transcripts2mergedExons.awk)
  • (mergedExons2exonIntronList.awk)
  • (exonIntronList2JunctionList.awk)
ADD COMMENTlink written 11 months ago by Alex Reynolds28k
1

Thank you so much. I will try this.

ADD REPLYlink written 9 months ago by biplab80
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 928 users visited in the last hour