Question: Extract the DNA sequence header!
0
gravatar for fufuyou
3.6 years ago by
fufuyou100
United States
fufuyou100 wrote:

I have question about our DNA sequence data. 
If the format of sequence data is
@HWI-ST1234:136:C5F6VACXX:6:1101:4121:2231 1:N:0:ACTTGA
TATGGGTTTCCACGGAGCACAGTGCCTAGTGCTCACTCCCCAGTTGTATCTTATTTTTCAGGTCAGCAGGTCGGGCCGGGAGTGTGACATGACGGAGCAGA
+
CCCFFFDDHHHHHJJGIJJJJJHIJJIJJHIJIJIJJJJJJIIIIJGIIBFHHFHJJJG>FHIJIGIIEHAHBBAB@BDBDD<?ACA>CDDDDDD5<BBD?. 
I  can extract the sequence identifier as @HWI-ST1234:136:C5F6VACXX:6:1101:4121:2231 1:N:0:ACTTGA using the code.
If the format of sequence data is
@HWI-ST1234:136:C5F6VACXX:6:1101:4295:2242 1:N:0:ACTTGA
AATACTTGTACGAGGGTGTTTTGCCACACCATATCTCATAAGGTGTGTTGGGTACATCTTTACTTGTCATTCTATTCAAAATATGTGTTGTTGTTTC
+
@@@ADD?DH8FH1CGG2A<F@FH?@?FC1DFGEDB9?BFHHIF?8?DBC=FB5@CDA;@)=.))..).;;B@B?@>>BDCCCCCD>B;?=5??<?CC
I can not extract the sequence identifier.

So I think the problem is the sequence data. The first symbol of  second one is @. The first symbol of identifier also is @. So the code can not extract the correct sequencing identifier from our sequence data.
I want to extract the sequence identifier form my sequence data. The identifier format is @HWI-ST1234:136:C5F6VACXX:6:1101:4295:2242 1:N:0:ACTTGA. Could you help me do it?
Thanks,
Fuyou

tool genome • 1.1k views
ADD COMMENTlink modified 3.6 years ago • written 3.6 years ago by fufuyou100

cat Input.fastq | paste - - - - | cut -f1 > ReadIDs.txt

Goutham's solution that purely uses awk should be much faster. 

ADD REPLYlink modified 3.6 years ago • written 3.6 years ago by Ashutosh Pandey11k

Goutham and Ashutosh,

Thanks ,

It is working.

ADD REPLYlink written 3.6 years ago by fufuyou100
1
gravatar for geek_y
3.6 years ago by
geek_y9.3k
Barcelona/CRG/London/Imperial
geek_y9.3k wrote:

You just need to print the read name which is the first line of every 4 lines in fastq format. something like:

awk '{if (NR%4==1) print}'​
ADD COMMENTlink modified 3.6 years ago • written 3.6 years ago by geek_y9.3k

or just

awk 'NR%4==1'
ADD REPLYlink written 3.6 years ago by Pierre Lindenbaum118k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1238 users visited in the last hour