Question: How can I extract the overlap of the sequence data
0
gravatar for 190444373
4 weeks ago by
1904443730
1904443730 wrote:

I have a sample of double-end sequencing 18-R-001_R1._fastq.gz,18-R-001_R2._fastq.gz, how to get the overlap of R1 and R2, and filter out the reads that R1 and R2 perfectly matched.(without mismatch)

rna-seq • 177 views
ADD COMMENTlink modified 4 weeks ago by genomax62k • written 4 weeks ago by 1904443730
1

Hello 190444373 ,

could you please explain why do you think this is a good idea?

fin swimmer

ADD REPLYlink written 4 weeks ago by finswimmer9.8k

maybe my expression is unclear.so it looks a little bad.

ADD REPLYlink written 4 weeks ago by 1904443730
2
gravatar for genomax
4 weeks ago by
genomax62k
United States
genomax62k wrote:

You can use BBMerge from BBTools or FLASH to do the actual merging of the reads allowing for no mismatches.

  • You can then use reformat.sh from BBMap suite to filter your data where the merged read is exactly the same length as R1/R2 (I am assuming your reads are all identical length to begin with and have not been trimmed, e.g. you could set minlength=n+1, n = length of R1/R2). That will filter out all reads where R1 and R2 perfectly match (your requirement).

  • If R1/R2 perfectly match but have a shorter insert than the length of sequencing, those reads would also be removed by filter above.

ADD COMMENTlink modified 4 weeks ago • written 4 weeks ago by genomax62k

Thank you for your reply, maybe my expression is unclear. For example, a 2x150bp read pair, the overlap is 50bp in the middle, I want this 50bp double-end sequence, and I want this 50bp R1 to be completely match 50bp R2. If the reads of R1R2 are not exactly matched, remove it.

ADD REPLYlink written 4 weeks ago by 1904443730

This can be done with bbmerge to which genomax have linked.

$ bbmerge.sh in1=18-R-001_R1._fastq.gz in2=18-R-001_R2._fastq.gz out=overlap.fastq.gz pfilter=1 trimnonoverlapping=t
  • in1 and in2 define the input files
  • outdefine the output file
  • pfilter=1 leads to merging only if there is no mismatch
  • trimnonoverlapping=t trimm the parts of the reads that not overlap
ADD REPLYlink written 4 weeks ago by finswimmer9.8k

Thank you so much, this is what I want.

ADD REPLYlink written 4 weeks ago by 1904443730

If possible, I would like to ask if I can achieve my goal at the level of the bam file. For example, a bam file gets the read of the double-ended overlap, and R1R2 is also completely matched.

ADD REPLYlink written 4 weeks ago by 1904443730

You will have to do the merging/trimming at the read level and then align the merged reads.

ADD REPLYlink modified 27 days ago • written 4 weeks ago by genomax62k

thank you for your reply.

ADD REPLYlink written 27 days ago by 1904443730
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 941 users visited in the last hour