Question: Problem With Ngs_Backbone - 454 Sequence Cleaning Before De Novo Assembly
1
gravatar for Francois Olivier Hébert
9.6 years ago by
Quebec
Francois Olivier Hébert280 wrote:

I am new in the world of NGS technologies and I'm currently trying to clean 454 sequences before making a de nevo assembly with all my reads. I read on this forum that for this kind of raw data, ngs backbone would be perfect, especially for trimming, eliminating low complexity sequences and ngs adaptors.

I downloaded all the packages needed by the application and it seems to be OK. But when I try to use it with the "cleaning analysis" function, I systematically get an error message. Since I'm not very familiar with Python and this kind of application, I'm having a hard time finding a solution.

It would be tremendously appreciated to have some feedback on the problem, if anyone has an idea on how to solve it. Thanks to you out there who are helping people like me to improve their bio-info skills, that's a good sense of community ! :-)

Ok, so here's the error message that I get after typing this command (sorry, it's long !) :

backbone_analysis.py -a clean_reads



<type 'exceptions.RuntimeError'>
Python 2.6.5: /usr/bin/python2.6
Mon Mar 28 16:38:00 2011

A problem occurred in a Python script.  Here is the sequence of
function calls leading up to the error, in the order they occurred.

 /usr/local/bin/backbone_analysis.py in <module>()
   84         cgitb.handler()
   85         raise
   86 
   87 if __name__ == '__main__':
   88     main()
main = <function main>

 /usr/local/bin/backbone_analysis.py in main()
   75         for action in actions:
   76             start_time = datetime.datetime.today()
   77             do_analysis(project_settings=settings_fpath, kind=action)
   78             time_elapsed = datetime.datetime.today() - start_time
   79             logger.info('Time elapsed %s' % str(time_elapsed))
global do_analysis = <function do_analysis>
project_settings undefined
settings_fpath = '/home/hebertfo/Desktop/cleaning_FD/backbone.conf'
kind undefined
action = 'clean_reads'

 /usr/local/lib/python2.6/dist-packages/franklin/backbone/backbone_runner.py in do_analysis(kind='clean_reads', project_settings='/home/hebertfo/Desktop/cleaning_FD/backbone.conf', analysis_config={}, silent=False)
  118     analyzer_klass = analysis_def['analyzer']
  119     analyzer = analyzer_klass(project_settings=settings,
  120                         analysis_definition=analysis_def, silent=silent)
  121 
  122     analyzer.run()
analyzer = <franklin.backbone.cleaning.CleanReadsAnalyzer object>
analyzer.run = <bound method CleanReadsAnalyzer.run of <franklin.backbone.cleaning.CleanReadsAnalyzer object>>

 /usr/local/lib/python2.6/dist-packages/franklin/backbone/cleaning.py in run(self=<franklin.backbone.cleaning.CleanReadsAnalyzer object>)
  206             seq_pipeline_runner(pipeline, configuration, infhands,
  207                                 file_info['format'], processes=self.threads,
  208                                 writers={'seq':writer})
  209             input_fhand.close()
  210             output_fhand.close()
writers undefined
writer = <franklin.seq.writers.SequenceWriter object>

 /usr/local/lib/python2.6/dist-packages/franklin/pipelines/pipelines.py in seq_pipeline_runner(pipeline=[{'arguments': {}, 'comment': 'It convers the sequence to upper case', 'function': <function create_upper_mapper>, 'name': 'up_case', 'type': 'mapper'}, {'arguments': {'aligner': 'exonerate', 'vectors': None}, 'comment': 'Remove adaptors', 'function': <function create_vector_striper_by_alignment>, 'name': 'remove_adaptors', 'type': 'mapper'}, {'arguments': {'parameters': {'bracket': [10, 0.02], 'cdna': None, 'error': [0.014999999999999999, 0.014999999999999999], 'keep': None, 'window': [50, 0.080000000000000002, 10, 0.29999999999999999]}}, 'comment': 'Strip low quality with lucy', 'function': <function create_striper_by_quality_lucy2>, 'name': 'strip_lucy', 'type': 'bulk_processor'}, {'arguments': {'aligner': 'blast+', 'vectors': 'UniVec'}, 'comment': 'Remove vector using vector db', 'function': <function create_vector_striper_by_alignment>, 'name': 'remove_vectors', 'type': 'mapper'}, {'arguments': {}, 'comment': 'Mask low complexity regions', 'function': <function create_masker_for_low_complexity>, 'name': 'mask_low_complex', 'type': 'mapper'}, {'arguments': {'words': []}, 'comment': 'It removes the given regexs from the sequence', 'function': <function create_word_striper_by_alignment>, 'name': 'remove_short_adaptors', 'type': 'mapper'}, {'arguments': {'left_length': None, 'right_length': None}, 'comment': 'Strip given edge lengths. Both sides', 'function': <function create_edge_stripper>, 'name': 'edge_removal', 'type': 'mapper'}, {'arguments': {'count_masked': False, 'length': 100}, 'comment': 'Remove seq shorter than X nt', 'function': <function create_length_filter>, 'name': 'remove_short', 'type': 'filter'}], configuration={'edge_removal': {'left_length': None, 'right_length': None}, 'remove_adaptors': {'vectors': None}, 'remove_short': {'length': 100}, 'remove_short_adaptors': {'words': []}, 'remove_vectors': {'vectors': 'UniVec'}, 'strip_lucy': {'parameters': {'bracket': [10, 0.02], 'cdna': None, 'error': [0.014999999999999999, 0.014999999999999999], 'keep': None, 'window': [50, 0.080000000000000002, 10, 0.29999999999999999]}}, 'strip_trimpoly': {'ntrim_above_percent': 2.0}}, in_fhands={'in_seq': <open file '/home/hebertfo/Desktop/cleaning_FD/reads/raw/lb_cor3.pl_454.sm_C_D18.sfastq', mode 'r'>}, file_format='fastq', writers={'seq': <franklin.seq.writers.SequenceWriter object>}, processes=None)
  335 
  336     # The SeqRecord generator is consumed
  337     for sequence in sequences:
  338         for writer in writers.values():
  339             writer.write(sequence)
sequence undefined
sequences = <itertools.ifilter object>

 /usr/local/lib/python2.6/dist-packages/franklin/seq/seq_cleaner.py in strip_vector_by_alignment(sequence=SeqWithQuality(seq=Seq('CTCTCTCTCTCTCTCTCTCTCTCT...35, 35, 35, 33, 31, 31, 28, 28, 28, 28, 21, 21],))
  482 
  483         # first we are going to align he sequence with the vectors
  484         alignment_fhand = aligner_(sequence)[aligner]
  485         # We need to parse the result
  486         alignment_result = parser(alignment_fhand)
alignment_fhand undefined
aligner_ = <function run_cmd_for_sequence>
sequence = SeqWithQuality(seq=Seq('CTCTCTCTCTCTCTCTCTCTCTCT...35, 35, 35, 33, 31, 31, 28, 28, 28, 28, 21, 21],)
aligner = 'blast+'

 /usr/local/lib/python2.6/dist-packages/franklin/utils/cmd_utils.py in run_cmd_for_sequence(sequence=SeqWithQuality(seq=Seq('CTCTCTCTCTCTCTCTCTCTCTCT...35, 35, 35, 33, 31, 31, 28, 28, 28, 28, 21, 21],))
  378             else:
  379                 raise RuntimeError('Problem running ' + tool + ': ' + stdout +
  380                                stderr)
  381 
  382         # Now we are going to make this list with the files we are going to
stderr = 'BLAST Database error: No alias or index file fou...arch path [/home/hebertfo/Desktop/cleaning_FD::]\n'
<type 'exceptions.RuntimeError'>: Problem running blast+: BLAST Database error: No alias or index file found for nucleotide database [UniVec] in search path [/home/hebertfo/Desktop/cleaning_FD::]

    __class__ = <type 'exceptions.RuntimeError'>
    __delattr__ = <method-wrapper '__delattr__' of exceptions.RuntimeError object>
    __dict__ = {}
    __doc__ = 'Unspecified run-time error.'
    __format__ = <built-in method __format__ of exceptions.RuntimeError object>
    __getattribute__ = <method-wrapper '__getattribute__' of exceptions.RuntimeError object>
    __getitem__ = <method-wrapper '__getitem__' of exceptions.RuntimeError object>
    __getslice__ = <method-wrapper '__getslice__' of exceptions.RuntimeError object>
    __hash__ = <method-wrapper '__hash__' of exceptions.RuntimeError object>
    __init__ = <method-wrapper '__init__' of exceptions.RuntimeError object>
    __new__ = <built-in method __new__ of type object>
    __reduce__ = <built-in method __reduce__ of exceptions.RuntimeError object>
    __reduce_ex__ = <built-in method __reduce_ex__ of exceptions.RuntimeError object>
    __repr__ = <method-wrapper '__repr__' of exceptions.RuntimeError object>
    __setattr__ = <method-wrapper '__setattr__' of exceptions.RuntimeError object>
    __setstate__ = <built-in method __setstate__ of exceptions.RuntimeError object>
    __sizeof__ = <built-in method __sizeof__ of exceptions.RuntimeError object>
    __str__ = <method-wrapper '__str__' of exceptions.RuntimeError object>
    __subclasshook__ = <built-in method __subclasshook__ of type object>
    __unicode__ = <built-in method __unicode__ of exceptions.RuntimeError object>
    args = ('Problem running blast+: BLAST Database error: No...arch path [/home/hebertfo/Desktop/cleaning_FD::]\n',)
    message = 'Problem running blast+: BLAST Database error: No...arch path [/home/hebertfo/Desktop/cleaning_FD::]\n'

The above is a description of an error in a Python program.  Here is
the original traceback:

Traceback (most recent call last):
  File "/usr/local/bin/backbone_analysis.py", line 88, in <module>
    main()
  File "/usr/local/bin/backbone_analysis.py", line 77, in main
    do_analysis(project_settings=settings_fpath, kind=action)
  File "/usr/local/lib/python2.6/dist-packages/franklin/backbone/backbone_runner.py", line 122, in do_analysis
    analyzer.run()
  File "/usr/local/lib/python2.6/dist-packages/franklin/backbone/cleaning.py", line 208, in run
    writers={'seq':writer})
  File "/usr/local/lib/python2.6/dist-packages/franklin/pipelines/pipelines.py", line 337, in seq_pipeline_runner
    for sequence in sequences:
  File "/usr/local/lib/python2.6/dist-packages/franklin/seq/seq_cleaner.py", line 484, in strip_vector_by_alignment
    alignment_fhand = aligner_(sequence)[aligner]
  File "/usr/local/lib/python2.6/dist-packages/franklin/utils/cmd_utils.py", line 380, in run_cmd_for_sequence
    stderr)
RuntimeError: Problem running blast+: BLAST Database error: No alias or index file found for nucleotide database [UniVec] in search path [/home/hebertfo/Desktop/cleaning_FD::]

It seems to me that there's a lot of different problems, but concerning the one with with UniVec, I don't understand : I placed UniVec_core with all the other databases used by blast+.

Thanks again !

next-gen sequence sequencing • 3.5k views
ADD COMMENTlink modified 9.6 years ago by Lyuan0 • written 9.6 years ago by Francois Olivier Hébert280

Just a hint: it looks like the very last line is the significant one and this is a blast+ error. I don't know that program but I would try to debug from that assumption. for debugging check the fllowing points: where's your blast database? can you run blast against UniVec from the commandline? Do you need to create a symbolic link to the database in a certain directory and call it UniVec? Is the database valid?

ADD REPLYlink written 9.6 years ago by Michael Dondrup47k

Michael basically already said it, but in one sentence: you need a BLAST database called "UniVec" (not UniVec_core) in /home/hebertfo/Desktop/cleaning_FD/

ADD REPLYlink written 9.6 years ago by Michael Schubert7.0k
1
gravatar for Jose Blanca
9.6 years ago by
Jose Blanca10
Jose Blanca10 wrote:

Hi:

The problem is that blast+ has not found the Univec database. Check the installation running a stand alone blast+ search before running ngs_backbone. Remember that ngs_backbone has a mailing list in https://listas.upv.es/mailman/listinfo/ngs_backbone There we could answer any question much faster. Best regards

ADD COMMENTlink written 9.6 years ago by Jose Blanca10

Thanks Jose ! I thought that you might find my question and answer... I think you took part in designing the application, am I right ? I will surely use the mailing list in the future. Thank you very much ! :)

ADD REPLYlink written 9.6 years ago by Francois Olivier Hébert280
0
gravatar for Ketil
9.6 years ago by
Ketil4.0k
Norway
Ketil4.0k wrote:

I'm not familiar with "NGS backbone", so pardon my ignorance, but why is it important to mask against UniVec (which seems to be what you are trying to do)? What organism are you sequencing, and why do you think it contains vector sequences? Are you using traditional vector cloning or amplification or something? If you want to filter out pathogens and symbionts (presumably in some established model organism, like human), wouldn't you be better off using a more comprehensive database?

Also, what assembler are you using? I find that Newbler seems to be pretty good at estimating and adjusting to sequence quality, and - going against conventional wisdom - producing much higher quality contigs than Celera.

ADD COMMENTlink written 9.6 years ago by Ketil4.0k

Yeah well, I'm working on whitefish (Coregonus clupeaformis) and I don't think I need to mask against UniVec, but it seems to be part of the process of this application. I'm really not familiar with it though, so maybe I don't need to do this. However, I got a pretty good answer up there and I think Jose Blanca made that application, so I'm in good hands ! Thanks for your answer though, it's very appreciated !

ADD REPLYlink written 9.6 years ago by Francois Olivier Hébert280

A salmonid? That's interesting - you're probably aware of the salmon genome project? Hopefully it's not too distant from Atlantic salmon for there to be some synergy - are you using any of the salmon resources in your project?

ADD REPLYlink written 9.6 years ago by Ketil4.0k

I'm so sorry ! I've been working a lot these days and I didn't see you comment. But yes, I'm aware of the project. I hope there's gonna be some material ready to be published soon, because we are having a lot of trouble with paralogs and repeated sequences throughout the genome. Tetraploid species are very complicated to work with. And for my array design, I used all the ESTs available on the cGRASP website, so yeah, it's been really helpful so far. Cheers !... and sorry for my late answer ! :S

ADD REPLYlink written 9.6 years ago by Francois Olivier Hébert280

I'm so sorry ! I've been working a lot these days and I didn't see your comment. But yes, I'm aware of the project. I hope there's gonna be some material ready to be published soon, because we are having a lot of trouble with paralogs and repeated sequences throughout the genome. Tetraploid species are very complicated to work with. And for my array design, I used all the ESTs available on the cGRASP website, so yeah, it's been really helpful so far. Cheers !... and sorry for my late answer ! :S

ADD REPLYlink written 9.6 years ago by Francois Olivier Hébert280

I agree that newbler does a good job of filtering out crap all by itself.

ADD REPLYlink written 9.5 years ago by Yannick Wurm2.3k
0
gravatar for Taslima
9.4 years ago by
Taslima0
Bangladesh
Taslima0 wrote:

Hello

ya the problem is blast+ can't find the formated "UniVec_core". when u create a project, a file should be generated named "backbone.conf". plz chk the "vector_database" parameter. it should be something like this

vector_database = '/home/ngs_backbone-1.3.2/franklin/data/blastdbs/UniVec_Core'

plz chk the directory weather it have the formated database, if not thn format it with formatdb.

I think this will do.

Regards

ADD COMMENTlink written 9.4 years ago by Taslima0

Yeah thanks ! I finally found the way to use it properly and it works well. It's a great cleaning tool !

ADD REPLYlink written 9.4 years ago by Francois Olivier Hébert280
0
gravatar for Lyuan
9.0 years ago by
Lyuan0
Lyuan0 wrote:

hi the blast database should be format,you try this: path/formatdb -i UniVec_core -p F good luck!

ADD COMMENTlink written 9.0 years ago by Lyuan0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 878 users visited in the last hour