How can I count read that falls into exon-intron and exon-exon junction?
1
1
Entering edit mode
5.9 years ago
biplab ▴ 110

I have gtf file as follows

VI  ensembl gene    53260   54696   .   -   .   gene_id "YFL039C"; gene_name "ACT1"; gene_source "ensembl"; gene_biotype "protein_coding";
VI  ensembl transcript  53260   54696   .   -   .   gene_id "YFL039C"; transcript_id "YFL039C"; gene_name "ACT1"; gene_source "ensembl"; gene_biotype "protein_coding"; transcript_name "ACT1"; transcript_source "ensembl"; transcript_biotype "protein_coding";
VI  ensembl exon    54687   54696   .   -   .   gene_id "YFL039C"; transcript_id "YFL039C"; exon_number "1"; gene_name "ACT1"; gene_source "ensembl"; gene_biotype "protein_coding"; transcript_name "ACT1"; transcript_source "ensembl"; transcript_biotype "protein_coding"; exon_id "YFL039C.1";
VI  ensembl CDS 54687   54696   .   -   0   gene_id "YFL039C"; transcript_id "YFL039C"; exon_number "1"; gene_name "ACT1"; gene_source "ensembl"; gene_biotype "protein_coding"; transcript_name "ACT1"; transcript_source "ensembl"; transcript_biotype "protein_coding"; protein_id "YFL039C";
VI  ensembl start_codon 54694   54696   .   -   0   gene_id "YFL039C"; transcript_id "YFL039C"; exon_number "1"; gene_name "ACT1"; gene_source "ensembl"; gene_biotype "protein_coding"; transcript_name "ACT1"; transcript_source "ensembl"; transcript_biotype "protein_coding";
VI  ensembl exon    53260   54377   .   -   .   gene_id "YFL039C"; transcript_id "YFL039C"; exon_number "2"; gene_name "ACT1"; gene_source "ensembl"; gene_biotype "protein_coding"; transcript_name "ACT1"; transcript_source "ensembl"; transcript_biotype "protein_coding"; exon_id "YFL039C.2";
VI  ensembl CDS 53260   54377   .   -   1   gene_id "YFL039C"; transcript_id "YFL039C"; exon_number "2"; gene_name "ACT1"; gene_source "ensembl"; gene_biotype "protein_coding"; transcript_name "ACT1"; transcript_source "ensembl"; transcript_biotype "protein_coding"; protein_id "YFL039C";

How can I use this gtf file to make a bed file which I will be able to use for counting reads that falls into exon-exon and exon-intron junction?

RNA-Seq next-gen sequencing • 2.6k views
ADD COMMENT
0
Entering edit mode

Since it looks like your mapping to genomic sequences, how (and why?) would you count reads to fall into exon-exon? you will need some pretty long reads and/or very small introns to get into such a situation. Moreover any read that spans a exon-exon junction will be definition also span (2) exon-intron junctions.

I can understand the exon-intron part and then Alex Reynolds will work nicely

ADD REPLY
5
Entering edit mode
5.9 years ago

Steps via awk and BEDOPS.

Build a BED file of exons:

$ wget -qO- ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_28/gencode.v28.annotation.gtf.gz \
    | gunzip -c - \
    | awk '($3=="exon")' - \
    | gtf2bed - \
    | cut -f1-6 - \
    > gencode.v28.annotation.exons.c1t6.bed

Convert transcripts to merged exons:

$ awk -f transcripts2mergedExons.awk gencode.v28.annotation.exons.c1t6.bed > gencode.v28.annotation.mergedExons.c1t6.bed

Make an exon-intron list:

$ awk -f mergedExons2exonIntronList.awk gencode.v28.annotation.mergedExons.c1t6.bed > gencode.v28.annotation.exonsAndIntrons.c1t6.bed

Convert these to junctions:

$ awk -f exonIntronList2JunctionList.awk gencode.v28.annotation.exonsAndIntrons.c1t6.bed > gencode.v28.annotation.exonsIntronJunctions.c1t6.bed

Pad them, e.g. by 25 nt around the junction (adjust as needed):

$ bedops --everything --range 25 gencode.v28.annotation.exonsIntronJunctions.c1t6.bed > gencode.v28.annotation.exonsIntronJunctions.pad25.c1t6.bed

Map reads to the padded junctions:

$ bedmap --echo --count --delim '\t' gencode.v28.annotation.exonsIntronJunctions.pad25.c1t6.bed <(bam2bed < reads.bam) > answer.bed

The file answer.bed will contain the junction and the number of reads that map to — overlap with — the junction, by one or more bases.

Some links to Github Gists with listed awk scripts:

  • (transcripts2mergedExons.awk)
  • (mergedExons2exonIntronList.awk)
  • (exonIntronList2JunctionList.awk)
ADD COMMENT
2
Entering edit mode

Thank you so much. I will try this.

ADD REPLY

Login before adding your answer.

Traffic: 2195 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6