Question

Analyzing only forward reads in mothur

0

Entering edit mode

7.4 years ago

giderk • 0

I am new to using mothur and in attempting to follow the MiSeq_SOP guide with my data on 18S rRNA v4 sequences I am stumped in the beginning. The guide assumes you want to make contigs from paired-end reads, but I would like to analyze my forward reads separately. How do I combine just my read 1 fastq files, retaining the sample name for all the sequences in each file of course, so that I can proceed with the following alignment and classification steps? I can't seem to find an explanation on how to do this anywhere.

miseq mothur forward read • 5.5k views

ADD COMMENT • link updated 4.9 years ago by rehab1171 • 0 • written 7.4 years ago by giderk • 0

0

Entering edit mode

I have a similar question. Any luck in finding a solution?

ADD REPLY • link 7.1 years ago by pmatson • 0

0

Entering edit mode

Can't you just skip the make.contigs()? Try preparing the stability files with just R1.

You may also have to hack your way and change your file names to the same pattern Mothur uses after the make.contigs() step.

ADD REPLY • link 7.1 years ago by h.mon 35k

0

Entering edit mode

But make.contigs() is the step where a quality information filter is applied as fastq files are converted to fasta files. If you skip make.contigs how do you get a fasta file? I tried making a .files table with only the forward fastq listed but wasn't able to get it to work. If you did, could you please share an example of the format?

Thanks! I am having this same problem trying to re-analyze some old public datasets (sequenced before paired end was a thing.)

Here is the description of the .files file format. You'll notice every format option includes a forward and reverse read: https://www.mothur.org/wiki/Make.contigs#file

ADD REPLY • link 7.0 years ago by rrr ▴ 90

0

Entering edit mode

Hello guys, I am new to Galaxy and I have the same problem. I tried what is said above. fastaq.info created 5 files but all were empty (0bytes). So I tried the following (all using galaxy): 1) Concatenate fastaq files (to create one single fastaq file of my 7 fasta qfiles) 2) converted my concatenate fastaq file to fasta 3) created a group from the concatenate fasta file. 4) then I ran unique.seq and I used the group file "group" and the concatenate fasta file as the "fasta" file. It failed. Checking the gorup file, I believe I lost the information related to which sequences belong to which samples...could it be the reason? how to fix that? In brief, how to run galaxy using singe reads? easy step by step for a beginner like me. Unfortunately, my data quality are poor and hence I must trim long stretches (100 bp out of 300bp), this resulted that the forward and reverse reads to not form contigs. Thank you very much guys..it is my first time and I feel totally lost... Cheers

ADD REPLY • link 4.9 years ago by rehab1171 • 0

score 1 · Answer 1 · 2017-08-01

1

Entering edit mode

6.7 years ago

alealdre ▴ 10

Hey there, you could use Fastq.info() to obtain the fasta and quality file for either forward or reverse sequences. Here is the link to the mothur help:

https://www.mothur.org/wiki/Fastq.info

Hope it's what you needed!

ADD COMMENT • link 6.7 years ago by alealdre ▴ 10

0

Entering edit mode

That's what I did. I wanted to test the pipeline on paired ends and on forward and reverse separately, so I used fastq.info() to obtain a fasta file from the fastq. I think make.contigs() also outputs a .groups file which is not made when you use fastq.info, so you need to make a .groups file too as you need that for unique.seqs()...

ADD REPLY • link 6.6 years ago by vali • 0

score 0 · Answer 2 · 2018-09-18

Hey guys, I know this is over a year old but I found out something that can fix this. Currently we are using mothur to analyze our PacBio data which comes as a bam file. We then convert the bam to a fastq file and ran into the same issue mentioned above.

So lets say I have 5 samples after we convert to fastq called 101.fastq, 102.fastq, 103.fastq, 104.fastq and 105.fastq.

First go into mothur and run fastq.info on all the samples (if using pacbio make sure to use the pacbio=T flag).

Then run a command called make.groups(), for this example this command would look like;

make.group(fasta=101.fasta-102.fasta-103.fasta-104.fasta-105.fasta, groups=101-102-103-104-105)

This results in a file called mergegroups (you can easily rename this)

Then exit mothur (or run a system() command) and run

cat *fasta > combo.fasta

to combine your fasta files. Then go back into mothur and treat the mergegroups file and the combo.fasta file like the output you would get from a make.contigs() command.

Hope this helps!

Bob