Incorporating Ensembl and Fasta Data
0
0
Entering edit mode
6.7 years ago

Hello to all, this is my first question. I am trying to create training data for a neural network that will predict DNA binding sites of proteins. I have Ensembl data to fill in vectors that I dedicate to each nucleotide. Three sample rows from the data is below.

chrom   txStart     txEnd       name                score   strand  cdsStart    cdsEnd score exonCount exonLengths exonPos

chr4    11051044    11076204    ENSMUST00000058183  0   -   11051096    11076176    0   9   181,57,102,134,103,57,123,100,225,  0,2387,7967,9816,11005,15606,19155,22099,24935,
chr3    121723536   121735052   ENSMUST00000029771  0   +   121723714   121734239   0   6   266,106,218,179,160,947,    0,1440,5800,7987,8844,10569,
chr5    44472132    44799707    ENSMUST00000070748  0   -   44473281    44799493    0   8   1380,152,124,84,123,173,103,346,    0,6960,8093,60528,62905,69554,197224,327229,

I am able to get some data from Ensembl data such as on which strand the nucleotide is or whether the nucleotide is in an exon. I also want to indicate the type of nucleotide (A, T, C, G, N) in the vector dedicated to it; here I face a problem. In the Ensembl data, I have fractions of sequences but in the Fasta data I have all DNA sequenced. For the nucleotides that are not included in the Enseml data I will only place ATCGN information in their vectors, and for the ones in the Ensembl data I will place more features. How can I incorporate Ensembl and Fasta data to this end?

fasta genome sequence ensembl • 1.1k views
ADD COMMENT

Login before adding your answer.

Traffic: 2522 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6