I've been given a sequence to write a report as if I was a bioinformatician working with a research group who have recently obtained this sequence as part of a genome sequencing project.
I've divided the report into two parts, PLAN
part 1 - Annotating the sequence 1) Find longest open reading frame via https://www.ncbi.nlm.nih.gov/orffinder/ and then find the longest ORF and therefore the amino acid sequence of the protein encoded by blasting the ORF 2)Wanted to find potential motifs of the protein (not sure how I would do this or with what bioinformatic tool) 3) Possibly multiple sequence alingments (Also not sure how) 4) Calculating SNP of the sequence (Unsure) 5) Identifying read coverage based on statistical base frequencies, residue frequencies and CpG islands (Unsure) 6) Determing such as molecular weight, isoelectric point, transmembrane regions and hydrophobicity (Unsure)
Part 2 - Analysis of the likely function of the protein 1) by identifying homologs of the protein based on amino acid sequence and thereby predicting function based on the similarity with other proteins that share a high sequence identify with the protein of interest. 2) Not sure on what else I should do here.
Any help on how I would do this would be really appreciated, thank you.