Get upstream and downstream exon's starting site and ending site of an retained introns
0
0
Entering edit mode
2.8 years ago
Kai_Qi ▴ 130

Hi:

I have an output file from IRFinder to compare intron retention. It contains the chromosome, strand, intron start site, intron end site.

Now I would like to get the coordinates of the upstream/downstream exon starting and ending site. If there a way to get it done?

Thanks, Kai

RNAseq • 1.0k views
ADD COMMENT
1
Entering edit mode

If you have coordinates you could take a GTF file, extract only the exons, and then use something like bedtools closest to get the closest exons.

ADD REPLY
0
Entering edit mode

I have used the following ways to sort the exons:

library(GenomicFeatures)
txdb <- makeTxDbFromGFF("GRCm38_gene_125bp.gtf", format="gtf")
EX <- exons(txdb)
head(EX)
write.csv(as.data.frame(EX)[,-4], file="Exon_coordinates.csv", col.names=T)

Then I have convert the csv files into bed file and using my intron containing bedfile to finish the bedtools closest command:

head dynamic_IR_IRratio_cluster1_1.bed 
chr1    10059847    10064332    Cspp1/ENSMUSG00000056763/clean  .   +   1
chr1    121554138   121555215   Ddx18/ENSMUSG00000001674/clean  .   -   2
chr1    127803293   127803391   Ccnt2/ENSMUSG00000026349/known-exon .   +   3
chr1    130701851   130702230   Pfkfb2/ENSMUSG00000026409/clean .   -   4
chr1    134630753   134631236   Kdm5b/ENSMUSG00000042207/known-exon .   +   5
chr1    135405972   135406516   Ipo9/ENSMUSG00000041879/clean   .   -   6
chr1    135453288   135454044   Nav1/ENSMUSG00000009418/clean   .   -   7
chr1    136171306   136171587   Kif21b/ENSMUSG00000041642/clean .   +   8
chr1    170880226   170880572   Dusp12/ENSMUSG00000026659/known-exon    .   -   9
chr1    170880691   170880914   Dusp12/ENSMUSG00000026659/known-exon    .   -   10

The exons coordinates I used is:

head All_Exons_Coordinates_1.bed
chr1    3073253 3074322 3073253 .   +   2410
chr1    3102016 3102125 3102016 .   +   2411
chr1    3252757 3253236 3252757 .   +   2412
chr1    3466587 3466687 3466587 .   +   2413
chr1    3513405 3513553 3513405 .   +   2414
chr1    3531795 3532720 3531795 .   +   2415
chr1    3680155 3681788 3680155 .   +   2416
chr1    3752010 3754360 3752010 .   +   2417
chr1    4496551 4499378 4496551 .   +   2418
chr1    4497474 4497654 4497474 .   +   2419

Then I typed:

bedtools closest -a dynamic_IR_IRratio_cluster1_1.bed -b All_Exons_Coordinates_1.bed -s -D a > dynamic_IR_IRratio_cluster1_1_exon.bed
Error: Sorted input specified, but the file All_Exons_Coordinates_1.bed has the following out of order record

my aim is too get a file like this:

chr strand intron_start intron_end upstream_exon_start upstream_exon_end downstream_exon_start downstream_exon_end

Where is the problem of my code so that I got an error?

Thanks

ADD REPLY
1
Entering edit mode

Where is the problem of my code so that I got an error?

I think the error is clear, your file is not properly sorted. Use either bedtools sort or sort -k1,1 -k2,2n to do that.

ADD REPLY

Login before adding your answer.

Traffic: 1412 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6