Question: How to preprocess the mirna seq read count?
gravatar for acc.inpro321
4.8 years ago by
acc.inpro32130 wrote:

I am new to bioinformatics, and to learn more, I am starting by working on a project. I collected miRNA sequence data from the TCGA and it has a text file for each sample and the file includes:

miRNA_ID    read_count  reads_per_million_miRNA_mapped  cross-mapped

Following is the sample content of file

miRNA_ID    read_count  reads_per_million_miRNA_mapped  cross-mapped
hsa-let-7a-1    55243   9869.306676 N
hsa-let-7a-2    110572  19753.97748 Y
hsa-let-7a-3    55555   9925.046293 N
hsa-let-7b  94076   16806.92386 N
hsa-let-7c  11209   2002.517215 Y
hsa-let-7d  1843    329.256778  N
hsa-let-7e  7786    1390.989298 N
hsa-let-7f-1    166 29.656335   N
hsa-let-7f-2    66277   11840.55968 N
hsa-let-7g  4192    748.911782  N
hsa-let-7i  3617    646.186526  N
hsa-mir-1-1 0   0   N
hsa-mir-1-2 266 47.521597   N

... (trimmed)

How should I preprocess the data? I am not sure how to bring the read count to a range in between 0 and 1 for classification? Should I map the value?

value(i)=valuei−valuemin(valuemax−valuemin) value(i)=valuei−valuemin(valuemax−valuemin) Which one one of the columns is best suitable to be used for machine learning, reads per million or read count?

Thanks in advance.

P.S. This isn't just direct asking, I tried a bunch of things and results did not came out as expected, so the question is not effortless :p

ADD COMMENTlink written 4.8 years ago by acc.inpro32130

What is the biological question you are trying to answer?

ADD REPLYlink written 4.8 years ago by Sean Davis26k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2403 users visited in the last hour