How can I extract the overlap of the sequence data
1
0
Entering edit mode
5.3 years ago
190444373 • 0

I have a sample of double-end sequencing 18-R-001_R1._fastq.gz,18-R-001_R2._fastq.gz, how to get the overlap of R1 and R2, and filter out the reads that R1 and R2 perfectly matched.(without mismatch)

RNA-Seq • 2.2k views
ADD COMMENT
1
Entering edit mode

Hello 190444373 ,

could you please explain why do you think this is a good idea?

fin swimmer

ADD REPLY
0
Entering edit mode

maybe my expression is unclear.so it looks a little bad.

ADD REPLY
2
Entering edit mode
5.3 years ago
GenoMax 141k

You can use BBMerge from BBTools or FLASH to do the actual merging of the reads allowing for no mismatches.

  • You can then use reformat.sh from BBMap suite to filter your data where the merged read is exactly the same length as R1/R2 (I am assuming your reads are all identical length to begin with and have not been trimmed, e.g. you could set minlength=n+1, n = length of R1/R2). That will filter out all reads where R1 and R2 perfectly match (your requirement).

  • If R1/R2 perfectly match but have a shorter insert than the length of sequencing, those reads would also be removed by filter above.

ADD COMMENT
0
Entering edit mode

Thank you for your reply, maybe my expression is unclear. For example, a 2x150bp read pair, the overlap is 50bp in the middle, I want this 50bp double-end sequence, and I want this 50bp R1 to be completely match 50bp R2. If the reads of R1R2 are not exactly matched, remove it.

ADD REPLY
0
Entering edit mode

This can be done with bbmerge to which genomax have linked.

$ bbmerge.sh in1=18-R-001_R1._fastq.gz in2=18-R-001_R2._fastq.gz out=overlap.fastq.gz pfilter=1 trimnonoverlapping=t
  • in1 and in2 define the input files
  • outdefine the output file
  • pfilter=1 leads to merging only if there is no mismatch
  • trimnonoverlapping=t trimm the parts of the reads that not overlap
ADD REPLY
0
Entering edit mode

Thank you so much, this is what I want.

ADD REPLY
0
Entering edit mode

If possible, I would like to ask if I can achieve my goal at the level of the bam file. For example, a bam file gets the read of the double-ended overlap, and R1R2 is also completely matched.

ADD REPLY
0
Entering edit mode

You will have to do the merging/trimming at the read level and then align the merged reads.

ADD REPLY
0
Entering edit mode

thank you for your reply.

ADD REPLY

Login before adding your answer.

Traffic: 2517 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6