Question: How can I extract the overlap of the sequence data
0
gravatar for 190444373
6 months ago by
1904443730
1904443730 wrote:

I have a sample of double-end sequencing 18-R-001_R1._fastq.gz,18-R-001_R2._fastq.gz, how to get the overlap of R1 and R2, and filter out the reads that R1 and R2 perfectly matched.(without mismatch)

rna-seq • 276 views
ADD COMMENTlink modified 6 months ago by genomax69k • written 6 months ago by 1904443730
1

Hello 190444373 ,

could you please explain why do you think this is a good idea?

fin swimmer

ADD REPLYlink written 6 months ago by finswimmer11k

maybe my expression is unclear.so it looks a little bad.

ADD REPLYlink written 6 months ago by 1904443730
2
gravatar for genomax
6 months ago by
genomax69k
United States
genomax69k wrote:

You can use BBMerge from BBTools or FLASH to do the actual merging of the reads allowing for no mismatches.

  • You can then use reformat.sh from BBMap suite to filter your data where the merged read is exactly the same length as R1/R2 (I am assuming your reads are all identical length to begin with and have not been trimmed, e.g. you could set minlength=n+1, n = length of R1/R2). That will filter out all reads where R1 and R2 perfectly match (your requirement).

  • If R1/R2 perfectly match but have a shorter insert than the length of sequencing, those reads would also be removed by filter above.

ADD COMMENTlink modified 6 months ago • written 6 months ago by genomax69k

Thank you for your reply, maybe my expression is unclear. For example, a 2x150bp read pair, the overlap is 50bp in the middle, I want this 50bp double-end sequence, and I want this 50bp R1 to be completely match 50bp R2. If the reads of R1R2 are not exactly matched, remove it.

ADD REPLYlink written 6 months ago by 1904443730

This can be done with bbmerge to which genomax have linked.

$ bbmerge.sh in1=18-R-001_R1._fastq.gz in2=18-R-001_R2._fastq.gz out=overlap.fastq.gz pfilter=1 trimnonoverlapping=t
  • in1 and in2 define the input files
  • outdefine the output file
  • pfilter=1 leads to merging only if there is no mismatch
  • trimnonoverlapping=t trimm the parts of the reads that not overlap
ADD REPLYlink written 6 months ago by finswimmer11k

Thank you so much, this is what I want.

ADD REPLYlink written 6 months ago by 1904443730

If possible, I would like to ask if I can achieve my goal at the level of the bam file. For example, a bam file gets the read of the double-ended overlap, and R1R2 is also completely matched.

ADD REPLYlink written 6 months ago by 1904443730

You will have to do the merging/trimming at the read level and then align the merged reads.

ADD REPLYlink modified 5 months ago • written 6 months ago by genomax69k

thank you for your reply.

ADD REPLYlink written 5 months ago by 1904443730
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1558 users visited in the last hour