Question: fastQ files exploration
0
gravatar for ezraamustafa3
3 months ago by
ezraamustafa30 wrote:

How can I extract the sequences identifier only from a fastq file without the sequences or the quality scores using linux?

ngs fastq • 179 views
ADD COMMENTlink modified 3 months ago by h.mon27k • written 3 months ago by ezraamustafa30
1
gravatar for h.mon
3 months ago by
h.mon27k
Brazil
h.mon27k wrote:

Another option to print only read names is to print every 4th line, starting from the first line:

zcat file.fastq.gz | awk 'NR%4==1'
ADD COMMENTlink written 3 months ago by h.mon27k
0
gravatar for swbarnes2
3 months ago by
swbarnes26.2k
United States
swbarnes26.2k wrote:

Use zcat myfile.fastq.gz | head to see the first 10 lines. You should be able to see a couple of read names. The first few parts of it should be instrument ID and run ID, and if the fastqs are straight from the instrument, those should be constant in every read. Something like zgrep M012933 myfile.fastq.gz should get you all the read names.

ADD COMMENTlink written 3 months ago by swbarnes26.2k

It worked with me, really thanks!

ADD REPLYlink written 3 months ago by ezraamustafa30
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1810 users visited in the last hour