Question

How can I extract the overlap of the sequence data

0

Entering edit mode

5.3 years ago

190444373 • 0

I have a sample of double-end sequencing 18-R-001_R1._fastq.gz,18-R-001_R2._fastq.gz, how to get the overlap of R1 and R2, and filter out the reads that R1 and R2 perfectly matched.(without mismatch)

RNA-Seq • 2.2k views

ADD COMMENT • link updated 5.3 years ago by GenoMax 141k • written 5.3 years ago by 190444373 • 0

1

Entering edit mode

Hello 190444373 ,

could you please explain why do you think this is a good idea?

fin swimmer

ADD REPLY • link 5.3 years ago by finswimmer 16k

0

Entering edit mode

maybe my expression is unclear.so it looks a little bad.

ADD REPLY • link 5.3 years ago by 190444373 • 0

score 2 · Accepted Answer · 2019-01-17

2

Entering edit mode

5.3 years ago

GenoMax 141k

You can use BBMerge from BBTools or FLASH to do the actual merging of the reads allowing for no mismatches.

You can then use reformat.sh from BBMap suite to filter your data where the merged read is exactly the same length as R1/R2 (I am assuming your reads are all identical length to begin with and have not been trimmed, e.g. you could set minlength=n+1, n = length of R1/R2). That will filter out all reads where R1 and R2 perfectly match (your requirement).
If R1/R2 perfectly match but have a shorter insert than the length of sequencing, those reads would also be removed by filter above.

ADD COMMENT • link 5.3 years ago by GenoMax 141k

0

Entering edit mode

Thank you for your reply, maybe my expression is unclear. For example, a 2x150bp read pair, the overlap is 50bp in the middle, I want this 50bp double-end sequence, and I want this 50bp R1 to be completely match 50bp R2. If the reads of R1R2 are not exactly matched, remove it.

ADD REPLY • link 5.3 years ago by 190444373 • 0

0

Entering edit mode

This can be done with bbmerge to which genomax have linked.

$ bbmerge.sh in1=18-R-001_R1._fastq.gz in2=18-R-001_R2._fastq.gz out=overlap.fastq.gz pfilter=1 trimnonoverlapping=t

in1 and in2 define the input files
outdefine the output file
pfilter=1 leads to merging only if there is no mismatch
trimnonoverlapping=t trimm the parts of the reads that not overlap

ADD REPLY • link 5.3 years ago by finswimmer 16k

0

Entering edit mode

Thank you so much, this is what I want.

ADD REPLY • link 5.3 years ago by 190444373 • 0

0

Entering edit mode

If possible, I would like to ask if I can achieve my goal at the level of the bam file. For example, a bam file gets the read of the double-ended overlap, and R1R2 is also completely matched.

ADD REPLY • link 5.3 years ago by 190444373 • 0

0

Entering edit mode

You will have to do the merging/trimming at the read level and then align the merged reads.

ADD REPLY • link 5.2 years ago by GenoMax 141k

0

Entering edit mode

thank you for your reply.

ADD REPLY • link 5.2 years ago by 190444373 • 0