Question

Confidence score for NGS sequences mapped to BLAST database

0

Entering edit mode

5.9 years ago

hbrwatkins • 0

I have an experiment where there are two types of controls and one treatment. For each, I have R1 and R2 NGS reads, which I have mapped to 70,000 short sequences in a BLAST database. I have summarized by counting the number of duplicates mapped to each reference sequence.

Example data;

Ref.seq …. Cnl.A.R1 ...Cnl.A.R2 ….Cnl.B.R1 ….Cnl.B.R2 …..Trt.R1 …..Tr1.R2

NM_001 …... 10 ………….. 9 ……….. 40 ………….. 56 ………. 323 ……. 212

NM_002 …... 36 …………. 29 ……… 143 …………. 70 ………. 128 ……. 116

NM_003 ….. 430 ……….. 390 …….. 3285 ………. 1933 ….. 112831... 102009

Most duplicate counts are close to zero. A few are quite large.

I would like to determine a confidence score for each reference gene reflecting the probability that the number of duplicates in the treatment group is larger than in either of the controls. Can the spread between the R1 and R2 readings be used to make such a score? Obviously, n=2 is very small. Can the mean R1 - R2 spread across all genes be used, even though the spread increases with the magnitude of the count?

I would very much appreciate any suggestions, as well as any references. Thanks

Confidence BLAST database NGS sequence mapping • 836 views

ADD COMMENT • link 5.9 years ago by hbrwatkins • 0