Extract fastq reads by lists of sequences
0
0
Entering edit mode
8 weeks ago
dlekrud456 • 0

Hello,

I have lists of sequence which I would like to find fastq reads that contain these sequences.

Is there a tool or any possible programming to find fastq reads from specific lists of sequences??

My lists of sequences look like following,

GATAAAAAAAAAAAAAAAC
GATAAAAAAAAAAAAAACC
GATAAAAAAAAAAAAAATC
GATAAAAAAAAAAAAAAGC
GATAAAAAAAAAAAAACAC
GATAAAAAAAAAAAAACCC
GATAAAAAAAAAAAAACTC
GATAAAAAAAAAAAAATAC
GATAAAAAAAAAAAAATCC
GATAAAAAAAAAAAAATGC
GATAAAAAAAAAAAAAGAC
GATAAAAAAAAAAAAAGCC
GATAAAAAAAAAAAAAGGC
GATAAAAAAAAAAAACAAC
GATAAAAAAAAAAAACACC
GATAAAAAAAAAAAACCAC
GATAAAAAAAAAAAACCCC
GATAAAAAAAAAAAACCTC
GATAAAAAAAAAAAATAAC
GATAAAAAAAAAAAATCAC
GATAAAAAAAAAAAATTAC
GATAAAAAAAAAAAAGAAC
GATAAAAAAAAAAAAGACC
GATAAAAAAAAAAACAAAC
GATAAAAAAAAAAACCCCC
GATAAAAAAAAAAATAAAC
GATAAAAAAAAAAAGAAAC
GATAAAAAAAAAACAAAAC

. . . .

I have used grep to do this one by one but it's taking too long (I have 40k 19mers).

grep -A 2 -B 1 "CTCAAAAAAAAACAAAGGA" input.fastq |grep -v "^\-\-$" > output.fastq

Also, there is a problem with overlapping reads.

NGS genomics genome bioinformatics fastq • 296 views
ADD COMMENT
0
Entering edit mode

You can use grep -f file

-f, --file=FILE obtain PATTERN from FILE

So if you have a file with single pattern per line grep will pull out all sequences for all patterns.

ADD REPLY
0
Entering edit mode

thanks a lot!! I tried the code below and it's not working, could you please have a look??

grep -A 2 -B 1 -f list.txt input.fastq |grep -v "^\-\-$" > output.fastq

ADD REPLY
1
Entering edit mode

Looks right

ADD REPLY
0
Entering edit mode

the output fastqs seems to be empty :(

ADD REPLY

Login before adding your answer.

Traffic: 2313 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6