small wheat samples fastq or fatq.gz for testing pipeline
2.3 years ago

I am testing for a o pipeline of wheat fastq files to do variant calling on them I have a couple of raw and trimmed (cutadapt) files but they are too big for simple testing where cna I get shorter files or shorten the ones that I got in order to be able to test faster?

2.3 years ago

Why can't you just take the first 4000 lines of your fastq?

would that be ok shoudl I take the headers and everything how should I do this?

Fastq files have no headers. It is a plain text file. Please google how to get subsets of text files with Unix tools such as head. If the file is compressed, decompress first in a pipe such as zcat your.fastq.gz | head -n 400000 > subset.fq. This would get you the first 100.000 reads (factor 4 because one read consists of 4 lines, check fastq format specifications on why that is).

2.3 years ago
MatthewP ★ 1.0k

Hello, use seqtk. Command seqtk sample can sample specific number of reads randomly from fastq file. If your sequencing is paired-ends, remember to use same seed for both fastq file.

You do not need random sampling as reads in fastq are already randomized due to undirected loading of DNA to the flow cell. head does just fine.