I am new to bioinformatics. So if these questions seem to you a bit childish please forgive me.
I have two queries.
I am intending to perform a codon usage analysis followed by correspondence analyses for multiple microbial whole genomes of one bacterial species to find the association with the isolation source and ST type. Is this the right approach?
I have concatenated all the CDS in a single genome and joined all the genomes (average genes per genome is 4100). The problem is all the CDS when concatenated also have the stop codons in it. CodonW, famous for this analyses, cannot begin the analyses in the presence of stop codons. How can I remove stop codons?
Kindly suggest me a solution to the problem and concept.
Could you show the commands used to concatenate all the genomes? If you want to show differences between sources, why are you concatenating all genomes?
I agree with h.mon that this will be easier with unconcatenated genomes (concatenated CDSs is probably fine).
I would read all the CDSs in to BioPython and then do one of two things:
Use BioPythons own codon usage bias tools: http://biopython.org/DIST/docs/api/Bio.SeqUtils.CodonUsage-pysrc.html
Strip the last 3 bp from all the CDSs and then run them through CodonW.