Entering edit mode
2.1 years ago
DanK
▴
10
Hi,
I have E. Coli genome and in a dataset, that there are all the probed TSSs for this genome.
I want to get 400nts downstream and 100nts upstream from each TSS. How I must handle the out of bounds genome positions, for example if I have the first TSS at +50nts?
I must take the sequence from the end of the genome in this case?
Thank you in advance!
For what purpose do you want these sequences? If you want to examine 100 nt in front of each TSS, but as you mention, there is a TSS at a genomic coordinate of 50, then yes, for that TSS you can only take 50 nt. What difference does it make if some of your sequences are smaller than your chosen size because they may be close to the chromosome end?
On the other hand, E. coli is a circular genome, so why not take all the sequence you need for the TSS near the genomic start coordinate (genome length - 50 to + 50)?
Thank you for the reply!
I want to make an alignment and generate motifs and webLogo plots. If my sequences does not have the same length, it will be a problem for WebLogo, right?
So, I think that the biologically right is to take the rest nucleotides from the end or start, respectively.
Hi DanK, why did you delete the post?
Hi, because I have found the answer and because of nobody replied
Please add the answer you found as an answer here and accept it - that's the way professional/scientific forums work. Someone else may run into the same problem you had and your solution may be helpful to them. Imagine if everyone on a site like StackOverflow decided to delete their question because they found a solution - a lot of knowledge would not be available to the larger community and many would be at a loss.