Hi everyone, I'm a biotech-student and I'm currently approaching bioinformatics and computational biology. Here are some problems I don't know how to to deal with. I have downloaded some data from TCGA, related to BRCA: miRNA quantification data, for solid tumor samples and normal samples (NAT, I know); i want to perform a classification analyss using the decision tree model, considering miRNA expression values as features for the classification.
- What type of data should I use? Normalized, like RPKM or TPM or not?
- Do I need to normalize the data for what about the batch effect within each dataset? (separately for tumor samples and normal samples);
- How can I approach to the data before applying the decision tree algorithm? I can't completely understand how to start preparing my data for what about the normalization, and I don't know which steps should I follow and how.
I am inexperienced for what concern the statistical approaches needed, this is my very first attempt, I searched a lot but I don't know which could be the best way to start. Any advice will be extremely appreciated, thank you so much in advance.