Hello, I have a bed file listing chromosome regions corresponding to CITS (crosslink induced truncation sites), thus one nucleotide listed below. These sites are from an iCLIP experiment to identify binding sites of a specific RNA-binding protein.
$head CITS.bed chr1 568974 568975 CITS_1[gene=chr1_f_c24][PH=12][PH0=0.29][P=1.01e-12] 12 + chr1 2239149 2239150 CITS_2[gene=chr1_f_c1136][PH=7][PH0=0.40][P=2.21e-04] 7 + chr1 2239899 2239900 CITS_3[gene=chr1_f_c1138][PH=6][PH0=0.21][P=3.56e-04] 6 + chr1 2461199 2461200 CITS_4[gene=chr1_f_c1237][PH=5][PH0=0.17][P=1.46e-04] 5 + chr1 6346493 6346494 CITS_5[gene=chr1_f_c1541][PH=18][PH0=1.19][P=3.68e-13] 18 + chr1 8409692 8409693 CITS_6[gene=chr1_f_c2222][PH=6][PH0=0.21][P=1.45e-05] 6 +
I want to add a few more columns and annotate each nucleotide (i.e. transcript name, transcript type, feature (e.g. exon, 3'UTR, 5'UTR).
I've tried HOMER annotatePeaks.pl but this yields annotations near TSS which is not what I need (since it's not ChIP-seq data).
I've also tried bedtools intersect using the gtf file for my genome but none of the options seem to work as the output files look just like the bed file above.
$bedtools intersect -a sample.bed -b annotations.gtfconverted2.bed > results.bed
BEDOPS tools worked the best but missed a lot of annotations.
$bedmap --echo --echo-map --delim '\t' sample.fw.bed annotations.gtfconverted2.fwd.bed > answer.fw.bed
I processed for reverse (rv) strand too and then merged them by:
$bedops --everything answer.fw.bed answer.rv.bed > answer.bed
Any suggestions are appreciated!