small wheat samples fastq or fatq.gz for testing pipeline
2
0
Entering edit mode
2.3 years ago

I am testing for a o pipeline of wheat fastq files to do variant calling on them I have a couple of raw and trimmed (cutadapt) files but they are too big for simple testing where cna I get shorter files or shorten the ones that I got in order to be able to test faster?

test pipeline wheat • 681 views
2
Entering edit mode
2.3 years ago

Why can't you just take the first 4000 lines of your fastq?

0
Entering edit mode

would that be ok shoudl I take the headers and everything how should I do this?

1
Entering edit mode

Fastq files have no headers. It is a plain text file. Please google how to get subsets of text files with Unix tools such as head. If the file is compressed, decompress first in a pipe such as zcat your.fastq.gz | head -n 400000 > subset.fq. This would get you the first 100.000 reads (factor 4 because one read consists of 4 lines, check fastq format specifications on why that is).

0
Entering edit mode
2.3 years ago
MatthewP ★ 1.0k

Hello, use seqtk. Command seqtk sample can sample specific number of reads randomly from fastq file. If your sequencing is paired-ends, remember to use same seed for both fastq file.

0
Entering edit mode

You do not need random sampling as reads in fastq are already randomized due to undirected loading of DNA to the flow cell. head does just fine.