Question: Sequencing By Hybridization Data Analysis
0
gravatar for jack
6.0 years ago by
jack450
jack450 wrote:

I have NGS data of a gene, it's sequenced by SBH technology.as sequencing library, a hexamer library is used (4 in power of 6). like this :

TTTGAGGTGCAGATAGCTTGCTTTATTTTGTTGTTACTATCTCAAGGAGG
TCCAACAATTATAACTAACAATTGAATTTATACTTGCATGAAAAGAACTA
CATCAAATTGACATTTTGGGCAATTAGTAATATTGTTTAAAATTTAACAA
CAGCTTTATTTTGTTGTTGTTCTTTACTTTTTGCTGTGGCTCATTGCTTA
GGTGCCCAGGTTTTTCAGGTGCAATTAAAATTTAGAACTACCACACAAAG
GCATTGGCTGCACTCTGGGACCTCCAAGAGTTGGCACTGCTCTGGCATAG
GAATACTTGAATAGCTTGGTTAAATGAAGGGATGGCCAGGAGATGTTACT
.
.
.

I want to calculate the following things :

i) how many percent of gene I can discover uniquely with this hexamer library (assume all hexamer are used in library)

ii) how many different hexamer are present in this gene

Can somebody guide me how can I calculate them ?

ADD COMMENTlink modified 6.0 years ago by Ido Tamir5.0k • written 6.0 years ago by jack450
0
gravatar for Ido Tamir
6.0 years ago by
Ido Tamir5.0k
Austria
Ido Tamir5.0k wrote:

scala:

>val hexamerMap = gene.sliding(6).toList.groupBy(s => s).mapValues(_.length)
>hexamerMap.size
res11: Int = 301
>hexamerMap.values.groupBy(c => c).mapValues(_.toSeq.length).toList.sortBy(_._1)
res14: List[(Int, Int)] = List((1,264), (2,30), (3,7))

So 246 hexamers are there once, 30 twice and 7 thrice. From this one could calculate the percentage of uniquely coverage, if I understood your question i correctly.

Edit: actually I don't really understand question i.

ADD COMMENTlink modified 6.0 years ago • written 6.0 years ago by Ido Tamir5.0k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1583 users visited in the last hour