Question: Altering fastq sequence identifier
0
gravatar for fiona.newberry
19 months ago by
fiona.newberry80 wrote:

I am attempting to determine false positive/negative of various alignments and want to add a unique sequence identifier onto each fastq file.

I have ten genomes which I have synthetically sequenced (so 20 fq files). The current sequence identifiers look like this:

@simulated.2618103/1

I want to change it so that it looks like this

@simulated.2618103/1.1

Each of the ten genomes will have a sequence identified 1-10. I have tried reading about how to do this with awk but don't seem to understand the program.

Thanks

fastq • 690 views
ADD COMMENTlink modified 19 months ago by geek_y9.3k • written 19 months ago by fiona.newberry80
1

This would help, try to extend the answer in these links

ADD REPLYlink modified 19 months ago • written 19 months ago by venu6.0k
3
gravatar for geek_y
19 months ago by
geek_y9.3k
Barcelona/CRG/London/Imperial
geek_y9.3k wrote:

Its a bit tricky with fastq as you need to alter only the 1st line of every record ( each record is represented in 4 lines )

So, what you can do is :

awk '{ if (NR%4==1) gsub("$",".1",$1); print }' in.fq > renamed_in.fq

Change the gsub() according to your needs,

ADD COMMENTlink written 19 months ago by geek_y9.3k

THANK YOU!

do you mind explaining the parts of your awk script? I am really struggling to learn this. Do you know of any good learning material?

ADD REPLYlink written 19 months ago by fiona.newberry80

You can read any basic awk tutorials to understand the awk syntax and inbuilt variables.

ADD REPLYlink written 19 months ago by geek_y9.3k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 840 users visited in the last hour