Closed:Sample non-coding regions from the genome
0
0
Entering edit mode
3.7 years ago

I'd like to train a model on two sets of sequences:

  1. -249..+50 around the TSS of a set of genes
  2. Random 300bp sequences from non-coding regions

I have trouble sampling the latter. My idea was to randomly generate positions in the genome, and for each such random position check whether it is in a "gene" region, by examining the GFF3 annotation. If it's not with a gene, it can use it as a random non-coding sequence.

However, I was wondering if this problem has already been solved before and there're existing tools.

Thanks in advance.

genome sequence • 110 views
ADD COMMENT
This thread is not open. No new answers may be added
Traffic: 1558 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6