Question

small wheat samples fastq or fatq.gz for testing pipeline

0

Entering edit mode

4.8 years ago

jorge.perezparedes • 0

I am testing for a o pipeline of wheat fastq files to do variant calling on them I have a couple of raw and trimmed (cutadapt) files but they are too big for simple testing where cna I get shorter files or shorten the ones that I got in order to be able to test faster?

test pipeline wheat • 1.4k views

ADD COMMENT • link updated 4.8 years ago by MatthewP ★ 1.4k • written 4.8 years ago by jorge.perezparedes • 0

score 2 · Answer 1 · 2019-07-18

2

Entering edit mode

4.8 years ago

swbarnes2 14k

Why can't you just take the first 4000 lines of your fastq?

ADD COMMENT • link 4.8 years ago by swbarnes2 14k

0

Entering edit mode

would that be ok shoudl I take the headers and everything how should I do this?

ADD REPLY • link 4.8 years ago by jorge.perezparedes • 0

1

Entering edit mode

Fastq files have no headers. It is a plain text file. Please google how to get subsets of text files with Unix tools such as head. If the file is compressed, decompress first in a pipe such as zcat your.fastq.gz | head -n 400000 > subset.fq. This would get you the first 100.000 reads (factor 4 because one read consists of 4 lines, check fastq format specifications on why that is).

ADD REPLY • link 4.8 years ago by ATpoint 81k

score 0 · Answer 2 · 2019-07-18

0

Entering edit mode

4.8 years ago

MatthewP ★ 1.4k

Hello, use seqtk. Command seqtk sample can sample specific number of reads randomly from fastq file. If your sequencing is paired-ends, remember to use same seed for both fastq file.

ADD COMMENT • link 4.8 years ago by MatthewP ★ 1.4k

0

Entering edit mode

You do not need random sampling as reads in fastq are already randomized due to undirected loading of DNA to the flow cell. head does just fine.

ADD REPLY • link 4.7 years ago by ATpoint 81k