I'am computer science & engineering student & I'm currently working on my final project which related to genomics. I am applying big-data analytics on genomic field. My question is, I want a clear demonstration of how could I find the characteristics(such as race, eye color, skin color, hair color.. etc) & diseases(cholera, cancer, malaria.. etc) of human from the genomes, genes, DNA or RNA. My objective is to analyse the genomes using new programming methods(such as clustering, map-reduce, k-means, .. etc) instead of regular(i.e old or traditional) ones. I did some simple analysis on nucleotide sequence (FASTA format) using big-data (specifically hadoop) like finding oriC (i.e. replication origin) of vibrio cholera (bacterial genome) and generate the possible k-mers of a given nucleotide sequence, and now I am working on finding ORF (i.e. open reading frame). I would be very grateful for any help you could give me. I also apologize if my question is naive because i am computer science student.

