Question

How to preprocess the mirna seq read count?

3

Entering edit mode

9.2 years ago

acc.inpro321 ▴ 40

I am new to bioinformatics, and to learn more, I am starting by working on a project. I collected miRNA sequence data from the TCGA and it has a text file for each sample and the file includes:

miRNA_ID    read_count  reads_per_million_miRNA_mapped  cross-mapped

Following is the sample content of file

miRNA_ID    read_count  reads_per_million_miRNA_mapped  cross-mapped
hsa-let-7a-1    55243   9869.306676 N
hsa-let-7a-2    110572  19753.97748 Y
hsa-let-7a-3    55555   9925.046293 N
hsa-let-7b  94076   16806.92386 N
hsa-let-7c  11209   2002.517215 Y
hsa-let-7d  1843    329.256778  N
hsa-let-7e  7786    1390.989298 N
hsa-let-7f-1    166 29.656335   N
hsa-let-7f-2    66277   11840.55968 N
hsa-let-7g  4192    748.911782  N
hsa-let-7i  3617    646.186526  N
hsa-mir-1-1 0   0   N
hsa-mir-1-2 266 47.521597   N

... (trimmed)

How should I preprocess the data? I am not sure how to bring the read count to a range in between 0 and 1 for classification? Should I map the value?

value(i)=valuei−valuemin(valuemax−valuemin)
value(i)=valuei−valuemin(valuemax−valuemin)

Which one one of the columns is best suitable to be used for machine learning, reads per million or read count?

Thanks in advance.

P.S. This isn't just direct asking, I tried a bunch of things and results did not came out as expected, so the question is not effortless :p

mirna-seq preprocessing • 3.3k views

ADD COMMENT • link updated 2.3 years ago by Ram 45k • written 9.2 years ago by acc.inpro321 ▴ 40

0

Entering edit mode

What is the biological question you are trying to answer?

ADD REPLY • link 9.2 years ago by Sean Davis 27k