Hi, I have 2 fasta files (one of precursor novel microRNAs and one of mature novel microRNAs), and a bed file of the precursors. What I want to do is find where the 2 fasta files overlap and get the coordinates, relative to the precursor bed. I can't find any tool that would let me do this... the closest is bedtools getfasta, but I guess I want to do the opposite, rather than get a fasta sequence of overlapping regions I want to get the coordinates.
I considered converting the bed to gtf, indexing precursor gtf, and then mapping to this. But I realised the coordinates won't match that of where it actually is in the genome. Any suggestions would be really helpful!
In terms of sequence, correct? Sequence files themselves will only have local co-ordinates. So align the two fasta files first (blat may be good for this) and then see where the overlapping part is w.r.t genome and then cross-reference to GTF. Does that describe what you need.
If I understand what you're suggesting, I already have the 'overlapping part', that's my fasta file of the mature microRNA. It's the overlapping wrt to genome that I'm having trouble with... as these are short reads I think they may align to multiple places and not just within the precursor region (which I have both sequence and bed file for).
You can use ungapped alignments with your mature RNA (e.g.
bowtie v.1
) but as you say you will need to allow for these reads to multi-map. There is nothing you can do about that. Those reads that multi-map will have an equal chance of having come from any of those locations.