what is the proper protocol of aligning partial DNA sequence without having problem of framshift and stop codon to submit in NCBI
0
0
Entering edit mode
5.3 years ago

Need help. Can not submit bacterial gyrB sequences in NCBI. what is the proper way to analyze raw sequences? Any free software? can anybody explain the whole procedure? Thanks a lot.

alignment • 1.2k views
ADD COMMENT
0
Entering edit mode

Can you provide more info on why specifically you can not submit it to NCBI. eg. What's the feedback on your submission attempt?

ADD REPLY
0
Entering edit mode

what is the proper way to analyze raw sequences

What do you mean by this - what kind of analysis do you want to do?

submit bacterial gyrB sequences in NCBI

When you say "submit", do you mean deposit? Or are you trying to BLAST your sequences or something?

ADD REPLY
0
Entering edit mode

I want to deposit gyrB sequences in NCBI. I am a beginner and have less knowledge about bioinformatics. I want to know from the beginning. I just have raw partial gyrB housekeeping gene sequences. I can not submit the sequence as it is right? I need to know that do I need to align the raw sequences with their NCBI reference sequence first?

I have tried it by several methods. 1. I have aligned the sequences manually by checking with blastn and blastp and checked for the changes and modified it. How do you align the DNA sequences with the reference strain? manually or with software? Can I align in Mega by clustalW or have to do assembly? can you suggest?

I have got some sequences have mutations while translating. does it cause framshift? Is that ok if your sequence have some mutations in amino acid region? or give translation differently?

Can I use open reading frame finder and analyse to get correct reading frame without stop codons ans send for depositng? Here is a question. Should my gyrB tranlated proptein sequence must match with the corresponding reference sequence?

I have done those things manually and deposited in NCBI database. They said "we arenot able to accept your records in their current format as a number of sequences still have internal stop codons and/or frameshifts". Some common tools for sequence quality analysis are BLAST similarity searches and/or alignment of your sequences (noticing any insertions, deletions, and mismatches)". I will be really helpful if someone helps me. Thanks a lot.

ADD REPLY
0
Entering edit mode

It would be good to know answers for questions below.

  • Where did these sequences come from?
  • How did you generate them?
  • Are they from specific organisms or from mixed populations?
  • What is the main aim of your experiment?

You seem to have done most of the things one would need to do to get sequences ready for submission to NCBI (blast, gather sequences, do multiple sequence alignments, find changes etc) but it is not clear if you did those operations in that general order above.

If you don't have full sequence of the entire gyrB genes then submitting partial sequences containing possible frame shift errors would be of limited utility (as NCBI says). GenBank is an archival database and once something gets in there it stays forever. So doing due diligence upfront to submit correct data is the best strategy.

ADD REPLY
0
Entering edit mode

Those are bacterial gyrB sequences and those are isolated from different fish samples. My main aim is to deposite the sequences for further phylogenetic study. What I did --- 1. I have done blast the sequences with blastn 2. checked for the similarities 3. pairwise aligned with each raw sequence against closest reference strain by Mega 7.0 (clustalW). I have tried manually also. 4. found the changes and edited by looking through trace file and checked with blastp 5. submited by bankIt in Genbank. I have around 1100 bp sequences. But there are stop codons in the translation. I have several species of Aeromonas strains. are you telling me to do multiple alignment with the same species? Like Aeromonas hydrophila with ref sequences? is that ok to do it with Mega software? Thanks.

ADD REPLY
0
Entering edit mode

Those are bacterial gyrB sequences and those are isolated from different fish samples.

So you don't really know which bacteria these sequences are from. They just happen to be similar to Aeromonas based on a blast search, correct? They could potentially be from an as yet unknown species.

What experimental technique did you use to generate the sequence? PCR/metagenomic sequencing? Did you just fish out sequences related to gyrB from a larger pool of data?

I am not sure how you annotated your sequence files but at best you can say "(partial) (bacterial) gyrB sequence isolated from fish X, replace X with name of species if you know it". Unless you have full length sequences that match over entire length to references available in genbank it would be difficult to claim species level specificity.

ADD REPLY
0
Entering edit mode

Yes, I tried to deposit in NCBI database.

ADD REPLY
0
Entering edit mode

I want to deposit gyrB sequences in NCBI. I am a beginner and have less knowledge about bioinformatics. I want to know from the beginning. I just have raw partial gyrB housekeeping gene sequences. I can not submit the sequence as it is right? I need to know that do I need to align the raw sequences with their NCBI reference sequence first?

I have tried it by several methods. 1. I have aligned the sequences manually by checking with blastn and blastp and checked for the changes and modified it. How do you align the DNA sequences with the reference strain? manually or with software? Can I align in Mega by clustalW or have to do assembly? can you suggest?

  1. I have got some sequences have mutations while translating. does it cause framshift? Is that ok if your sequence have some mutations in amino acid region? or give translation differently?

  2. Can I use open reading frame finder and analyse to get correct reading frame without stop codons ans send for depositng? Here is a question. Should my gyrB tranlated proptein sequence must match with the corresponding reference sequence?

I have done those things manually and deposited in NCBI database. They said "we arenot able to accept your records in their current format as a number of sequences still have internal stop codons and/or frameshifts". Some common tools for sequence quality analysis are BLAST similarity searches and/or alignment of your sequences (noticing any insertions, deletions, and mismatches)". I will be really helpful if someone helps me. Thanks a lot.

ADD REPLY

Login before adding your answer.

Traffic: 1124 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6