I have bed intersect file wherein the gene feature is intersected by multiple reads. I assumed based on similarity I can filter the data however tools like blat and magic-blast didn't help as the reads span more than 5000 bp and covered outside of the gene feature / or covers the gene feature in fragments.
Now I wish to retrieve the fasta sequence of gene features that has been covered by the read. any suggestions on how to select the coordinates?
Additionally, if a gene feature is covered multiple fragments of reads at different coordinates without overlaps is there a way to assign alphabets to the gene feature ID?
<contig_name> <querystart> <querystop><queryID> <querystrand_direction> <featurestart> <featurestop> <featureID> <feature_strand>
contig 13000 14000 pac34 + 10000 16000 ID_84 + contig 14500 15784 pac75 + 10000 16000 ID_84 +
contig 13000 14000 pac34 + 10000 16000 ID_84a + contig 14500 15784 pac75 + 10000 16000 ID_84b +