I am trying to design a PCR primer for a gene whose sequence is not known. Even the whole transcriptome sequencing done in our lab did not identify that particular gene. Hence, I guess I am left with only one option: to design the degenerate primers for this gene by performing sequence alignment of the given protein from several related species and designing primers based on conserved region. Could you please explain the process of primer design by this method. I know some steps which I have described below stepwise:
step 1: download sequence of protein in question from related species from NCBI in FASTA format
step 2: perform alignment using clustal omega
step 3: identify the conserved domains
I am not sure how to move forward from here. I tried J-codehop for next steps but it is asking many parameters for which I have no clue. Thanks
Hello kamran.shekh!
It appears that your post has been cross-posted to another site: https://biology.stackexchange.com/questions/55707/designing-degenerate-primers-using-alignment-of-protein-sequences-from-other-spe
This is typically not recommended as it runs the risk of annoying people in both communities.
The species in my question is white sturgeon fish which is a very ancient cartilagenous fish. For transcriptomics, RNA-Seq library was loaded onto Mi-Seq v3 150 cycle cartridge and run as 75 basepair (bp) paired-end reads on a Mi-Seq sequencer (Illumina). No public databases for either the genome or transcriptome of white sturgeon were available. Therefore, a comprehensive reference transcriptome was constructed by use of de novo assembly from reads for liver of white sturgeon. Please let me know if need further information on assembly. Actually, I am interested in two genes: SLC39a8 (a zinc transporter) transporter and ECaC (a calcium transporter). Both are very common transporters in animals and are found in abundance across animal and plant kingdom. Sequence is known in many species also but as mentioned above, transcriptomic data in our lab failed to identify these . More importantly, ECaC sequence is even known in very closely related fish known as Lake sturgeon.
Thanks for the update! And how did you perform the de-novo assembly (Trinity?) and search (tblastn?) using a closely related species as template?
Contigs for the reference transcriptome were de novo assembled from the merged reads and unmerged paired-end reads from individual sequencing reads by use of CLC genomics workbench v.5.0 (CLC Bio) with default parameters. Contigs comprising the reference transcriptome were annotated by use of BlastX searches in Blast2GO v.2.5.0 software2 against sequences in the NCBI non-redundant protein database for zebrafish.
You should try a different assembler, I have not used CLC myself, but I don't think it is top notch for your TSA. What I heard is that it is easy to use and generates assemblies that look good on paper (N50, etc. ) but when we used it for our genome, it created a lot of chimeric contigs. Try trinity (with default parameters) to get a second assembly, then run a different search strategy:
get a few sequences from the closest related species + some more fish for your genes of interest and search using those as templates only, using tblastn or tblastx using the Trinity.fasta as blast database. Extract the best hits against NR again for validation. Good luck!
I/we can help you with that pipeline as well, if you want.
Thanks for detailed response. Will try the approach you suggested. Thanks