search multiple sequences in multiple fastqs
0
0
Entering edit mode
3.9 years ago
max_19 ▴ 170

Hi all!

I have a script which goes through a bunch of fastq files and searches for specific sequences found in a text file called test_text.txt, then outputs a fastq file with only those reads. The reads in my fastq files are all between 15-30bp long, however when I look at the output fastq files, I am getting reads of 52 bp long.

Does anyone know why this would happen? code is below:

for fastq in ls my_files;do
for i in cat test_text.txt; do zgrep -B 1 -A 2 $i$fastq >>filtered_fastq/${fastq}.filtered_reads.fastq; done; done  Thanks for any input. sequencing fastq • 802 views ADD COMMENT 0 Entering edit mode What have you done so far to convince yourself/check the reads are all 15-30bp? I can't see an obvious mistake at first glance, so are you sure there are no 52 bp reads in the input file? ADD REPLY 0 Entering edit mode I used cutadapt to set min/max length for all sequences (cutadapt -a$adapter3 -f fastq -m 15 -M 30). also when I do fastqc on the file, my sequence length in the basic stats section shows '15-30'

0
Entering edit mode

Make sure; do awk '{print length}' <filename> | sort -n | tail . The last number should be the longest sequence length in your file. Remove the headers before if you think they're longer than 30bp (i.e. grep -v ">" <filename> | awk '{print length}' | sort -n | tail

0
Entering edit mode

Thank you! Tried this and got 54. This is odd! does this mean something is wrong with my cutadapt step? and why does FASTQC report 15-30bp for sequence length

0
Entering edit mode

cutadapt should get rid of reads > 30 in length in that case. Are you sure that you have the trimmed file in my_files, and not the original file you had before running cutadapt?

0
Entering edit mode

By the way, the -f switch in grep allows you to provide a file of patterns to search; e.g. grep -f <file_with_patterns> <file_to_search_in>, so possibly your could get rid of your inner for loop.

0
Entering edit mode

Good to know, thank you!