Question: Sequencing By Hybridization Data Analysis
6.9 years ago by
jack450
jack450 wrote:

I have NGS data of a gene, it's sequenced by SBH technology.as sequencing library, a hexamer library is used (4 in power of 6). like this :

``````TTTGAGGTGCAGATAGCTTGCTTTATTTTGTTGTTACTATCTCAAGGAGG
TCCAACAATTATAACTAACAATTGAATTTATACTTGCATGAAAAGAACTA
CATCAAATTGACATTTTGGGCAATTAGTAATATTGTTTAAAATTTAACAA
CAGCTTTATTTTGTTGTTGTTCTTTACTTTTTGCTGTGGCTCATTGCTTA
GGTGCCCAGGTTTTTCAGGTGCAATTAAAATTTAGAACTACCACACAAAG
GCATTGGCTGCACTCTGGGACCTCCAAGAGTTGGCACTGCTCTGGCATAG
GAATACTTGAATAGCTTGGTTAAATGAAGGGATGGCCAGGAGATGTTACT
.
.
.
``````

I want to calculate the following things :

i) how many percent of gene I can discover uniquely with this hexamer library (assume all hexamer are used in library)

ii) how many different hexamer are present in this gene

Can somebody guide me how can I calculate them ?

written 6.9 years ago by jack450
6.9 years ago by
Ido Tamir
Austria
Ido Tamir5.1k wrote:

scala:

``````>val hexamerMap = gene.sliding(6).toList.groupBy(s => s).mapValues(_.length)
>hexamerMap.size
res11: Int = 301
>hexamerMap.values.groupBy(c => c).mapValues(_.toSeq.length).toList.sortBy(_._1)
res14: List[(Int, Int)] = List((1,264), (2,30), (3,7))

So 246 hexamers are there once, 30 twice and 7 thrice. From this one could calculate the percentage of uniquely coverage, if I understood your question i correctly.
``````

Edit: actually I don't really understand question i.