Question: Help in sff file format processing
0
gravatar for lzheng.chn
3.5 years ago by
lzheng.chn0
United States
lzheng.chn0 wrote:

Hi,

I am using my 454 data for OTU analysis in mothur. And I am confused after transform my sff to a fasta file. Sequencing information, platform 454 FLX, flow pattern TACG, barcode (AAAAAAAC) removed by sequencing center, primer: GAGTTTGATCNTGGCTCAG.

However, I have trouble to understand the sequence section (from 5th base to 12nd base). The primer started from 13 base. I attached the output fasta format from different toolkit. 

sff_extract (from seq_crumbs toolkit) with clipping:

GAGTTTGATCCTGGCTCAGATTGAACGCTGG....

sff_extract (from seq_crumbs toolkit) without clipping:

tcagagagcgaaGAGTTTGATCCTGGCTCAGATTGAACGCTGG...

mothur output after denoise:

AGAGCGAAGAGTTTGATCCTGGCTCAGATTGAACGCTGG...

Does anyone can help to understand the sequence agagcgaa part? Base on the sequencing center information, it does not belong to barcode. And how should I deal with it? For example, it there a way to remove this region in mothur? Thank you!

ADD COMMENTlink modified 3.5 years ago by Damian Kao15k • written 3.5 years ago by lzheng.chn0
2
gravatar for Josh Herr
3.5 years ago by
Josh Herr5.6k
University of Nebraska
Josh Herr5.6k wrote:

I'm a little confused what is actually happening here -- not sure what you are actually seeing and if it is an artifact of your sequencing or a sequence processing error.

Since it seems like you are running your sequences through mothur, I would use the Trim.seqs command (see wiki here).

If you're hoping to trim adapters outside of mothur there are lots of options discussed here

ADD COMMENTlink written 3.5 years ago by Josh Herr5.6k

 

Hi Josh,

Sorry for the confuse. In my sequence file (output from mothur shhh.flow as fasta format), every single sequence has a sequence section AGAGCGAA ahead of my primer sequence (GAGTTTGATCCTGGCTCAG). And base on the sequencing center, they removed all the barcode adapter. Also, because of the  AGAGCGAA sequence, mothur was not allow me to remove primer from each sequence. Do you have an idea of what does that sequence for? And how do I remove them? Thank you!

ADD REPLYlink written 3.5 years ago by lzheng.chn0

Hello lzheng.chn

I'm sorry to hear that you are having this problem -- I'm not sure what has caused it.  It might help if you add that sequence (AGAGCGAA) from all your samples to the oligo file that you provide in mothur to have that region trimmed during the shhh.flow step.  

You can alternately trim the sequence and primer regions off of your FASTA file outside of mothur with adapter trimming tools (Trimming Adapters For Paired-End Sequences) or at the command line with standard linux tools.

ADD REPLYlink written 3.5 years ago by Josh Herr5.6k

Hi Josh,

I found out the problem (or maybe). I shifted to QIIME by using process_sff.py, and it will trim the head like sff_extract. Maybe that is a bug in mothur. Thank you for help!

Le

ADD REPLYlink written 3.5 years ago by lzheng.chn0

Great - Glad it worked out!

ADD REPLYlink written 3.5 years ago by Josh Herr5.6k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 581 users visited in the last hour