Question: Merge Paired-End Reads
4
gravatar for Nicolas Rosewick
2.5 years ago by
Belgium
Nicolas Rosewick2.4k wrote:

Hi,

How can I merge two paired end fastq (R and L) to give a single fastq file ? For information, the sequencing run is 72 bp long and it contains a majority of small RNA (miRNA,...) so a lot of paired end reads will overlap.

For example here's two paired reads :

@HWUSI-EAS529:41:FC62YHFAAXX:8:1:7969:1330 1:N:0:GCCAAT
CTACGAAAGGGCACTTGGAATTCTCGGGTGCCAAGGAACTCCAGTCACGCCAATATCTCGTATGCCGTCTTCTGCT
+
IIIIIIIHIIHIIIIIIHHIIIHGIIIIEIIIIIIEIIHIIIIIIIIIIIHIIIIIBHIHIIHGIGIEGHHEGEEH


@HWUSI-EAS529:41:FC62YHFAAXX:8:1:7969:1330 2:N:0:GCCAAT
AGTGCCCTTTCGTAGGATCGTCGGACTGTAGAACTCTGAACGTGTAGATCTCGGTGGTCGCCGTATCATTAAAAAA
+
IIIIIIIIIIIIIIIIIIIIIIIDHIGIIIHIIIGHGIIIIIIIHHIHIIIIIIIIIHIIIIIIIIHIIGIIIIHI

I find the adapter in the first one :

Code:

EMBOSS_001         1 CTACGAAAGGGCACTTGGAATTCTCGGGTGCCAAGGAACTCCAGTCACGC     50
                                    |||||||||||||||||||||              
EMBOSS_001         1 ---------------TGGAATTCTCGGGTGCCAAGG--------------     21

EMBOSS_001        51 CAATATCTCGTATGCCGTCTTCTGCT     76

EMBOSS_001        22 --------------------------     21

but not in the second one ...

But I effectively found the overlap between the right read and the left read (using the reverse complement of it)

EMBOSS_001         1 --------------------------------------------------      0

EMBOSS_001         1 TTTTTTAATGATACGGCGACCACCGAGATCTACACGTTCAGAGTTCTACA     50

EMBOSS_001         1 -----------CTACGAAAGGGCACTTGGAATTCTCGGGTGCCAAGGAAC     39
                                |||||||||||||||                        
EMBOSS_001        51 GTCCGACGATCCTACGAAAGGGCACT------------------------     76

EMBOSS_001        40 TCCAGTCACGCCAATATCTCGTATGCCGTCTTCTGCT     76

EMBOSS_001        77 -------------------------------------     76

So my question is, how can I merge the two fastq files to produce a single fastq file ?

Thanks,

N.

ADD COMMENTlink modified 9 months ago by lelle370 • written 2.5 years ago by Nicolas Rosewick2.4k
1
gravatar for pmenzel
2.5 years ago by
pmenzel270
pmenzel270 wrote:

Maybe this program is suited for you: http://www.cbcb.umd.edu/software/flash/

ADD COMMENTlink written 2.5 years ago by pmenzel270

It's no available, the web page cannot be open. http://genomics.jhu.edu/software/FLASH/index.shtml

ADD REPLYlink written 17 months ago by litiancheng.gansu10

It's here now: http://ccb.jhu.edu/software/FLASH/

ADD REPLYlink written 9 months ago by matted4.0k
1
gravatar for Jeremy Leipzig
2.5 years ago by
Philadelphia, PA
Jeremy Leipzig12k wrote:

There is a decent program called stitch:[?] https://github.com/audy/stitch

I wrote a script called mergePairs that is very sensitive and incredibly slow:[?] http://code.google.com/p/standardized-velvet-assembly-report/source/browse/trunk/mergePairs.py

ADD COMMENTlink written 2.5 years ago by Jeremy Leipzig12k
1
gravatar for lelle
9 months ago by
lelle370
Berlin
lelle370 wrote:

As this was referenced from a duplicate question, I will add a newer tool to the list: PANDAseq

ADD COMMENTlink written 9 months ago by lelle370
0
gravatar for Stevelor
2.5 years ago by
Stevelor290
Stevelor290 wrote:

Either use Galaxy or use the single scripts

http://hg.notalon.org/galaxy/galaxy-central/src/7d9bb95caaa7/tools/fastq

HTH!

ADD COMMENTlink written 2.5 years ago by Stevelor290

uhhh which script?

ADD REPLYlink written 2.5 years ago by Jeremy Leipzig12k
Please log in to add an answer.

Help
Access
  • RSS
  • Stats
  • API

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.0.0
Traffic: 575 users visited in the last hour