Read detection with pattern in paired end FASTQ file
3
0
Entering edit mode
3.3 years ago
amitpande74 ▴ 20

HI, I have paired end reads, and want to extract reads which have the insert TGTATGTAAACTTCCGACTTCAACTGTAin them. I tried with grep -A2 -B1 "TGTATGTAAACTTCCGACTTCAACTGTA" input.fq |grep -v "^\-\-$" > 1.fq and 2.fq But they dont align with Bowtie2 anymore, because the reads have differing headers. I even tried using bbduk.sh in1=input_1.fq in2=input_2.fq out1=matched_1.fq out2=matched_2.fq k=28 literal=TGTATGTAAACTTCCGACTTCAACTGTA rcomp=f but it is of no avail. Can someone help. Regards.

FASTQ paired end Awk • 838 views
ADD COMMENT
1
Entering edit mode
3.3 years ago
paste <(cat fq1 | paste - - - - )  <(cat fq2 | paste - - - - )  |  grep  TGTATGTAAACTTCCGACTTCAACTGTA | tr "\t" "\n" > interleaved.fastq
ADD COMMENT
1
Entering edit mode
3.3 years ago
GenoMax 141k

but it is of no avail.

You should set the value of k= to something less than 1/2 of the length of string you are trying to search. Unless you do that the initial seed matches may not be found. I would try k=9 with your bbduk.sh command.

because the reads have differing headers.

That is a different issue. Are your reads out of sync in R1/R2 files? If so you need to repair.sh them.

ADD COMMENT
1
Entering edit mode
3.3 years ago

With seqkit:

$ seqkit grep -srip "TGTATGTAAACTTCCGACTTCAACTGTA"  input.fq(.gz)
ADD COMMENT

Login before adding your answer.

Traffic: 2321 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6