Question: Altering fastq sequence identifier
0
gravatar for fiona.newberry
2.1 years ago by
fiona.newberry80 wrote:

I am attempting to determine false positive/negative of various alignments and want to add a unique sequence identifier onto each fastq file.

I have ten genomes which I have synthetically sequenced (so 20 fq files). The current sequence identifiers look like this:

@simulated.2618103/1

I want to change it so that it looks like this

@simulated.2618103/1.1

Each of the ten genomes will have a sequence identified 1-10. I have tried reading about how to do this with awk but don't seem to understand the program.

Thanks

fastq • 849 views
ADD COMMENTlink modified 2.1 years ago by geek_y9.8k • written 2.1 years ago by fiona.newberry80
1

This would help, try to extend the answer in these links

ADD REPLYlink modified 2.1 years ago • written 2.1 years ago by venu6.2k
3
gravatar for geek_y
2.1 years ago by
geek_y9.8k
Barcelona
geek_y9.8k wrote:

Its a bit tricky with fastq as you need to alter only the 1st line of every record ( each record is represented in 4 lines )

So, what you can do is :

awk '{ if (NR%4==1) gsub("$",".1",$1); print }' in.fq > renamed_in.fq

Change the gsub() according to your needs,

ADD COMMENTlink written 2.1 years ago by geek_y9.8k

THANK YOU!

do you mind explaining the parts of your awk script? I am really struggling to learn this. Do you know of any good learning material?

ADD REPLYlink written 2.1 years ago by fiona.newberry80

You can read any basic awk tutorials to understand the awk syntax and inbuilt variables.

ADD REPLYlink written 2.1 years ago by geek_y9.8k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2608 users visited in the last hour