Question: search multiple sequences in multiple fastqs
0
gravatar for max_19
10 months ago by
max_19130
max_19130 wrote:

Hi all!

I have a script which goes through a bunch of fastq files and searches for specific sequences found in a text file called test_text.txt, then outputs a fastq file with only those reads. The reads in my fastq files are all between 15-30bp long, however when I look at the output fastq files, I am getting reads of 52 bp long.

Does anyone know why this would happen? code is below:

for fastq in `ls my_files`;do
    for i in `cat test_text.txt`; do zgrep -B 1 -A 2 $i $fastq >>filtered_fastq/${fastq}.filtered_reads.fastq; done;
done

Thanks for any input.

sequencing fastq • 244 views
ADD COMMENTlink written 10 months ago by max_19130

What have you done so far to convince yourself/check the reads are all 15-30bp?

I can't see an obvious mistake at first glance, so are you sure there are no 52 bp reads in the input file?

ADD REPLYlink written 10 months ago by Joe16k

I used cutadapt to set min/max length for all sequences (cutadapt -a $adapter3 -f fastq -m 15 -M 30). also when I do fastqc on the file, my sequence length in the basic stats section shows '15-30'

ADD REPLYlink written 10 months ago by max_19130

Make sure; do awk '{print length}' <filename> | sort -n | tail . The last number should be the longest sequence length in your file. Remove the headers before if you think they're longer than 30bp (i.e. grep -v ">" <filename> | awk '{print length}' | sort -n | tail

ADD REPLYlink modified 10 months ago • written 10 months ago by manuel.belmadani1.2k

Thank you! Tried this and got 54. This is odd! does this mean something is wrong with my cutadapt step? and why does FASTQC report 15-30bp for sequence length

ADD REPLYlink written 10 months ago by max_19130

cutadapt should get rid of reads > 30 in length in that case. Are you sure that you have the trimmed file in my_files, and not the original file you had before running cutadapt?

ADD REPLYlink modified 10 months ago • written 10 months ago by manuel.belmadani1.2k

By the way, the -f switch in grep allows you to provide a file of patterns to search; e.g. grep -f <file_with_patterns> <file_to_search_in>, so possibly your could get rid of your inner for loop.

ADD REPLYlink modified 10 months ago • written 10 months ago by manuel.belmadani1.2k

Good to know, thank you!

ADD REPLYlink written 10 months ago by max_19130
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1734 users visited in the last hour