rename reads in fastq files
2
2
Entering edit mode
5.5 years ago

Hi all, I have fastQ file and I need to rename it using sed command. below the explanation :

The read names in my files are

@HWI-ST365:251:D0RP0ACXX:5:1101:4471:2213#12_1
@HWI-ST365:251:D0RP0ACXX:5:1101:4471:2213#12_2

And i want to transform them in the format:

@HWI-ST365:251:D0RP0ACXX:5:1101:4471:2213#12/1
@HWI-ST365:251:D0RP0ACXX:5:1101:4471:2213#12/2

Thank you very much in advance for your help.

reads fastq sed lunix sequencing • 3.9k views
ADD COMMENT
0
Entering edit mode

What have you tried so far?

ADD REPLY
0
Entering edit mode

based of some forum, I tried sed -i 's/_///g' myfile but I'm not pro of linux I don't know how to do..

ADD REPLY
1
Entering edit mode

Avoid sed -i when you are not sure that your command will be the right one. If you do something incorrectly, it will corrupt your original file. When trying different sed commands, you may want to run

sed 's/from/to/g' <input> | head (to only look at the first lines)

or

sed 's/from/to/g' <input> | head | less -S ( in the case of long lines)

ADD REPLY
2
Entering edit mode
5.5 years ago
iraun ★ 3.8k

Well, you're very close to the solution. You only need to scape to '/' character: sed -i 's/_/\//g' should work.

Just a little advice, try to call to sed command in the following way:

cat file.fq | sed 's/_/\//g' > reformat.fq

In this way you can go back to the original input file in the case that something has gone wrong. In my opinion it is a good practice.

ADD COMMENT
1
Entering edit mode

Just as an alternative tr '_' '/' < file.fq > new_file.fq

ADD REPLY
0
Entering edit mode

Thank you I'll try this.

ADD REPLY
1
Entering edit mode
5.5 years ago
michael.ante ★ 3.7k

Check your fastq format. If you have Phred +64 (Illumina 1.3 or 1.5) you can run into a encoding problem: in Phred +64, '_' is a valid encoding for a quality score, '/' is not. Thus, you'll need to check if you are in the header-line or not (e.g. using awk: awk '{if(NR%4==1){gsub(/_/,"/")}; print}'' )

ADD COMMENT
1
Entering edit mode

I agree that you should verify the absence of _ in your quality sequence before to simply go for a sed 's/_/\//g'. because if there is you will change all your quality score coded _ by a new existing score \

ADD REPLY

Login before adding your answer.

Traffic: 1542 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6