Hello In a rare syndrome, we have identified there mutations in a family. We want to bioinformatically analyses these mutations. We need to know are they novel or not? And we want to know their effect on its proteins? does it have an effect on significant domains of the protein? Does it have effect on its 3D structure? I know no one can explain these in a single question. Please introduce me an article or only give me addresses of sites so i can work with them. Thanks
Some of your questions can be answered by "variant annotation". You can start by getting an overview of different variant annotators from this link: http://blog.goldenhelix.com/goldenadmin/the-sate-of-variant-annotation-a-comparison-of-annovar-snpeff-and-vep/
Then you can google out individual papers for the tools if you want to follow any one tool.
You can computationally translate your DNA sequence using expasy translate tool and choose the right frame of your putative protein from the output. Then, you can compare your wild protein and the mutated protein using tools called Scan Motif, Scan Prosite etc to find out the amino acid differences at motif or domain level and you could interpret. If you want to do protein modelling, you could use the server Geno3D and select the suitable templates to create a model for Rasmol viewer. Also try search your sequence at http://www.cathdb.info/search/by_sequence
I don't know exactly which online course to suggest. But there may be plenty of online bioinformatic courses. You can try. Also read some basic bioinformatic manuals with google search.
I can give you a brief introduction: This is normal sequence
ATGGGGGTCTCAAACGGACATCGCAACGGAAACGGAATCGTAGCCAACGGGCTTTGCTTGAAGAAGGAGT TGTCGGGAACTGTGCAGGATCCGTTGGGGTGGTTGAAGGCGGCGGAAGGGATGAAAGGGAGTCATCTGGA GGAAGTTAAGAAGATGGTGGAGGAGTTTAGGAATCCGGTGGTGAAGCTCGCCGGAAAGACTCTTAGCATT
You always or mostly need to convert this normal sequence into fasta format for bioinformatic applications: the following is the fasta format. (please read about fasta format sequence, may be here I couldn't convert to fasta format)
>anyname ATGGGGGTCTCAAACGGACATCGCAACGGAAACGGAATCGTAGCCAACGGGCTTTGCTTGAAGAAGGAGT TGTCGGGAACTGTGCAGGATCCGTTGGGGTGGTTGAAGGCGGCGGAAGGGATGAAAGGGAGTCATCTGGA GGAAGTTAAGAAGATGGTGGAGGAGTTTAGGAATCCGGTGGTGAAGCTCGCCGGAAAGACTCTTAGCATT
Then, you need to observe whether your sequence coming from Eukayotes or Prokaryotes.
If the sequence from prokaryotes (mostly bacteria), (please read Whittaker's five kingdom concept also), you can use the sequence directly without cutting the introns from the whole gene (ATG to UGA or other stop codons) and you could use it for the putative conversion of protein using expasy tool.
If the sequence from Eukaryotes (plants, animals , fungus, algae , amoeba etc other than bacteria or virus), you can use the sequence after cutting the introns/UTR or non coding regions (as the living eukaryotic cell do splicing or alternative splicing) from the whole gene and (ATG to UGA or other stop codons) you could use it for the putative conversion of protein using expasy tool.
You can use it two or many fasta sequences of whole gene (genomic) or mRNA (spliced product) or protein to align them using clustal omega or clustal X (If you want to convert to fasta or ALN or Phylip formats) for further usage in DNAsP, Arlequin, MEGA (you need to read manuals of these softwares and these are user friendly softwares).
If you want to retrieve any orthologous or paralogous sequences, you could directly use NCBI BLASTN for nucleotide, and BLASTP for protein.
For protein modelling you could use Geno3d for your protein to search four suitable templates with maximum identity and download it and view the models using Rasmol vewer (Please read the command lines of Rasmol also).
For instance I am copying the public database NCBI link for your better understanding of the eukaryotic gene: https://www.ncbi.nlm.nih.gov/nuccore/GQ504819.1 Click CDS, Exons, mRNA features inside the link to understand the structure of the gene.
You can use it directly for NCBI BLASTN for the search of related sequences.
Convert the fasta and copy to word and manually cut the introns and UTR regions ( with only ATG to stop codon) and use it as template for the putative protein prediction.
Hope these suggestions may help you!
And we want to know their effect on its proteins? does it have an effect on significant domains of the protein? Does it have effect on its 3D structure?
Unless the mutation is extremely deleterious, like it kills a splice site, or puts a very premature stop in the protein, you can only make an educated guess. You need biologists to do benchwork to prove that your mutations have a clinical impact. You can't prove anything about a new mutation on a computer.
I agree with swbarnes2. Bioinformatically you could find out different alleles from a geographic isolates or population even with premature stop codon etc and you could hypothetically translate them to a putative protein, but needs bench wet lab work from biologist to extract proteins (may needs to peform yeast two-hybrid system or other as well) and to perform phenotypic correlation to prove whether the truncated or unusual proteins affects the phenotype or not.