Question: Merge Paired-End Reads
9
gravatar for Nicolas Rosewick
8.5 years ago by
Belgium, Brussels
Nicolas Rosewick8.7k wrote:

Hi,

How can I merge two paired end fastq (R and L) to give a single fastq file ? For information, the sequencing run is 72 bp long and it contains a majority of small RNA (miRNA,...) so a lot of paired end reads will overlap.

For example here's two paired reads :

@HWUSI-EAS529:41:FC62YHFAAXX:8:1:7969:1330 1:N:0:GCCAAT
CTACGAAAGGGCACTTGGAATTCTCGGGTGCCAAGGAACTCCAGTCACGCCAATATCTCGTATGCCGTCTTCTGCT
+
IIIIIIIHIIHIIIIIIHHIIIHGIIIIEIIIIIIEIIHIIIIIIIIIIIHIIIIIBHIHIIHGIGIEGHHEGEEH


@HWUSI-EAS529:41:FC62YHFAAXX:8:1:7969:1330 2:N:0:GCCAAT
AGTGCCCTTTCGTAGGATCGTCGGACTGTAGAACTCTGAACGTGTAGATCTCGGTGGTCGCCGTATCATTAAAAAA
+
IIIIIIIIIIIIIIIIIIIIIIIDHIGIIIHIIIGHGIIIIIIIHHIHIIIIIIIIIHIIIIIIIIHIIGIIIIHI

I find the adapter in the first one :

Code:

EMBOSS_001         1 CTACGAAAGGGCACTTGGAATTCTCGGGTGCCAAGGAACTCCAGTCACGC     50
                                    |||||||||||||||||||||              
EMBOSS_001         1 ---------------TGGAATTCTCGGGTGCCAAGG--------------     21

EMBOSS_001        51 CAATATCTCGTATGCCGTCTTCTGCT     76

EMBOSS_001        22 --------------------------     21

but not in the second one ...

But I effectively found the overlap between the right read and the left read (using the reverse complement of it)

EMBOSS_001         1 --------------------------------------------------      0

EMBOSS_001         1 TTTTTTAATGATACGGCGACCACCGAGATCTACACGTTCAGAGTTCTACA     50

EMBOSS_001         1 -----------CTACGAAAGGGCACTTGGAATTCTCGGGTGCCAAGGAAC     39
                                |||||||||||||||                        
EMBOSS_001        51 GTCCGACGATCCTACGAAAGGGCACT------------------------     76

EMBOSS_001        40 TCCAGTCACGCCAATATCTCGTATGCCGTCTTCTGCT     76

EMBOSS_001        77 -------------------------------------     76

So my question is, how can I merge the two fastq files to produce a single fastq file ?

Thanks,

N.

paired merge rna fastq • 23k views
ADD COMMENTlink modified 2.7 years ago by FatihSarigol140 • written 8.5 years ago by Nicolas Rosewick8.7k
1

Hi, I see that you had a similar case like me, so probably you can help me :) 

As I always do the miRNA analysis in single end I'm confused how to proceed when I have paired-end? Can you recommend me how to clean the reads and have them ready for analysis, particularly I cannot understand how and what is the relation of the reverse-compliment miRNA sequence in R2 read to the R1 set?
In summary my R1 read is containing 100nt - miRNA+barcode+smallRNAadapter+another adapter+polyA
my R2 is containing miRNA (reversed compliment to R1) + long adapter (or linker) + polyA

Thanks for any help in advance!

ADD REPLYlink written 4.7 years ago by manekineko130

Hi,

So for an exact answer to this problem.

the R1.fq are the foward reads and the R2.fq are written in reverse-complement.

For example, if I want to create a single file from reads in R1.fq and R2.fq, I have to do "reverse-complement" of reads in R2.fq??

am I right?

Thank you

ADD REPLYlink modified 3 months ago by RamRS26k • written 4.3 years ago by midox260

no response for this problem?

ADD REPLYlink written 4.3 years ago by midox260
2
gravatar for pmenzel
8.5 years ago by
pmenzel310
pmenzel310 wrote:

Maybe this program is suited for you: http://www.cbcb.umd.edu/software/flash/

ADD COMMENTlink written 8.5 years ago by pmenzel310

It's no available, the web page cannot be open. http://genomics.jhu.edu/software/FLASH/index.shtml

ADD REPLYlink written 7.4 years ago by litiancheng.gansu10

It's here now: http://ccb.jhu.edu/software/FLASH/

ADD REPLYlink written 6.8 years ago by matted7.2k
1
gravatar for Jeremy Leipzig
8.5 years ago by
Philadelphia, PA
Jeremy Leipzig19k wrote:

There is a decent program called stitch:[?] https://github.com/audy/stitch

I wrote a script called mergePairs that is very sensitive and incredibly slow:[?] http://code.google.com/p/standardized-velvet-assembly-report/source/browse/trunk/mergePairs.py

ADD COMMENTlink modified 6 months ago by RamRS26k • written 8.5 years ago by Jeremy Leipzig19k
1
gravatar for lelle
6.8 years ago by
lelle820
Berlin
lelle820 wrote:

As this was referenced from a duplicate question, I will add a newer tool to the list: PANDAseq

ADD COMMENTlink written 6.8 years ago by lelle820
1
gravatar for Andreas
5.1 years ago by
Andreas2.4k
Singapore
Andreas2.4k wrote:

One more: SeqPrep

Andreas

ADD COMMENTlink written 5.1 years ago by Andreas2.4k
0
gravatar for Stevelor
8.5 years ago by
Stevelor310
Stevelor310 wrote:

Either use Galaxy or use the single scripts

http://hg.notalon.org/galaxy/galaxy-central/src/7d9bb95caaa7/tools/fastq

HTH!

ADD COMMENTlink modified 6 months ago by RamRS26k • written 8.5 years ago by Stevelor310
1

uhhh which script?

ADD REPLYlink written 8.5 years ago by Jeremy Leipzig19k
0
gravatar for Gabriel R.
5.1 years ago by
Gabriel R.2.7k
Danmarks Tekniske Universitet
Gabriel R.2.7k wrote:

If you wish to trim adapters and merge in a single step, you can the leeHom, we use it mainly to reconstruct ancient DNA sequences but it has broader uses as well:

http://nar.oxfordjournals.org/content/42/18/e141

Click here for the Website of the repository

It use a Bayesian maximum a posteriori approach that considers quality scores for both the adapter determination and the merging part. 

ADD COMMENTlink modified 15 months ago • written 5.1 years ago by Gabriel R.2.7k
0
gravatar for FatihSarigol
2.7 years ago by
FatihSarigol140
Durham
FatihSarigol140 wrote:

There is BBMerge, which is ""designed to merge two overlapping paired reads into a single read. For example, a 2x150bp read pair with an insert size of 270bp would result in a single 270bp read"": http://jgi.doe.gov/data-and-tools/bbtools/bb-tools-user-guide/bbmerge-guide/

ADD COMMENTlink written 2.7 years ago by FatihSarigol140
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2304 users visited in the last hour