I am a beginning bioinformatics student enrolled in the one bioinformatics course my community college offers. The pace of the course is relaxed and includes an extremely fundamental series of discussions regarding perl (a language I am very experienced with) in the form of very simple programming assignments which even a programming neophyte could probably complete within 15-20 minutes, basic searching and retrieval from GenBank, multiple sequence alignment with various software tools, e.g., clustal omega, muscle, and an introduction to BLAST.
Since I am very interested in the field of bioinformatics, I felt compelled to ask for clarification regarding several fundamental topics that are, unfortunately, not addressed in the course syllabus. Apologies if my questions are overly simplistic:
1). If I were to download a complete human genome sequence, in what format would it be in? Fasta? Would it be a monolithic Fasta file or 23 files (one per chromosome) in Fasta format?
2.) I'm interested in using either perl or Java to examine various genes. Would locating specific genes be feasible?
3.) I have tried in vain to locate public data which consists of a "normal" gene and one from an individual afflicted with cancer, Example: Healthy BRCA1 and a copy of a BRCA1 gene with mutations that lead to the development of a neoplasm. I would like to compare them and identify the location of the mutations, etc. GenBank does not seem to store "mutated" sequence info. Rather, I have only been able to locate BRCA1 and BRCA2 sequence data for various organisms with no indication that the Homo sapien was or was not afflicted with a form of cancer.
If anyone could provide some helpful feedback, I would be very appreciative. Having such a strong interest in the field and no mentor to consult is, as you may imagine, frustrating.