I am developing a script that will count the number of times a short nucleotide sequence hits non coding regions of the human genome. Based on google searches, Blast+ appears to be the tool to use. They have a few cookbook recipes about masking a database with a FASTA files which I want to leverage.matthew_rich
I want to know if there is a way to pull all known transcripts for the human genome and put a 50-100bp buffer on the 5' and 3' ends (to avoid potential regulatory elements) and write those sequences to a file. I did not see anything on ncbi suggesting BLAST could do this task.
Does anyone have a suggestion on how to accomplish this task?
Thanks in advance.