Question

Read detection with pattern in paired end FASTQ file

0

Entering edit mode

3.3 years ago

amitpande74 ▴ 20

HI, I have paired end reads, and want to extract reads which have the insert TGTATGTAAACTTCCGACTTCAACTGTAin them. I tried with grep -A2 -B1 "TGTATGTAAACTTCCGACTTCAACTGTA" input.fq |grep -v "^\-\-$" > 1.fq and 2.fq But they dont align with Bowtie2 anymore, because the reads have differing headers. I even tried using bbduk.sh in1=input_1.fq in2=input_2.fq out1=matched_1.fq out2=matched_2.fq k=28 literal=TGTATGTAAACTTCCGACTTCAACTGTA rcomp=f but it is of no avail. Can someone help. Regards.

FASTQ paired end Awk • 838 views

ADD COMMENT • link updated 3.3 years ago by GenoMax 141k • written 3.3 years ago by amitpande74 ▴ 20

score 1 · Answer 1 · 2021-01-14

1

Entering edit mode

3.3 years ago

Pierre Lindenbaum 161k

paste <(cat fq1 | paste - - - - )  <(cat fq2 | paste - - - - )  |  grep  TGTATGTAAACTTCCGACTTCAACTGTA | tr "\t" "\n" > interleaved.fastq

ADD COMMENT • link 3.3 years ago by Pierre Lindenbaum 161k

score 1 · Answer 2 · 2021-01-14

but it is of no avail.

You should set the value of k= to something less than 1/2 of the length of string you are trying to search. Unless you do that the initial seed matches may not be found. I would try k=9 with your bbduk.sh command.

because the reads have differing headers.

That is a different issue. Are your reads out of sync in R1/R2 files? If so you need to repair.sh them.

score 1 · Answer 3 · 2021-01-14

1

Entering edit mode

3.3 years ago

cpad0112 21k

With seqkit:

$ seqkit grep -srip "TGTATGTAAACTTCCGACTTCAACTGTA"  input.fq(.gz)

ADD COMMENT • link 3.3 years ago by cpad0112 21k