How can keep the change in my fastq file when i use grep and sed to edit it?
2
0
Entering edit mode
5.4 years ago
Zeason ▴ 10

I just want to do some modification to my read id in my fastq file. And i use grep to get the id i want to edit , then i use sed to make the change . But i find there is no change in my original fastq file . here is my command:

cat test.fastq |grep '^@.*/1'| sed 's/@/@ILUMINA/g'

how can i solve it , thanks a lot !!!

genome RNA-Seq gene sequence software error • 3.7k views
ADD COMMENT
1
Entering edit mode

never use '@' as a signal that the line is the header, because '@' is also a valid character for the fastq quality.

ADD REPLY
0
Entering edit mode

simply run the sed command on your original file to modify it, omitting the grep part

Keep in mind though that the original will then not be present anymore (as you will have changed it), a better approach might be to redirect it to a new file

cat test.fastq | sed 's/@/@ILUMINA/g' > some-new_file

this might not be restrictive enough though, as it will also change all other occurrences of '@'

ADD REPLY
0
Entering edit mode

re: "simply run the sed command" - note: you must pass -i to modify it in place (assuming GNU sed)

ADD REPLY
3
Entering edit mode

That's not really something I would advise to novice users. Great way to lose your input data.

ADD REPLY
0
Entering edit mode

Agreed. Don't use the -i switch unless you're really sure what the sed does and you're sure you don't need the unmodified content later.

ADD REPLY
0
Entering edit mode

i just want change the each id of my reads . i think the way you recommend will change the quality also. the "grep '^@.*/1'" in my command just restrict the row i want to change to the id line in my fastq file. anyway ,thanks a lot

ADD REPLY
0
Entering edit mode

Your grep command wouldn’t have solved that issue anyway, as it would still match a quality line that begins with @

ADD REPLY
0
Entering edit mode

Out of curiosity: why do you want to add "ILLUMINA" to every header?

ADD REPLY
0
Entering edit mode

just a example , i just want to prefix the id. because the stupid sequencing company give me the pair-end fastq file whose id like this : @307/1 it cant support me to do markduplicate in GATK that really make me mad :(

ADD REPLY
0
Entering edit mode

And adding "ILLUMINA" to the headers will make markduplicate work? Are you referring to Picard MarkDuplicates? I thought it was supposed to work on bam files, not on fastq files.

Did you ask the sequencing company why the headers are like this? Illumina headers follow a different naming convention.

ADD REPLY
0
Entering edit mode

beacause picard just told me "Value was put into PairInfoMap more than once", and when i find solution on the net , i just find someone said this error results from some lane id in the fastq file is repeat. so i just want to edit the id of reads to solve it . this way really solve the problem at least now. maybe the way you told me works well ,but i dont how to do it. :(

ADD REPLY
0
Entering edit mode

If an answer was helpful you should upvote it, if the answer resolved your question you should mark it as accepted.
Upvote|Bookmark|Accept

ADD REPLY
5
Entering edit mode
5.4 years ago
Joe 21k

To avoid issues with @ in the quality line as Pierre points out:

 sed '1~4s/^@/@ILUMINA/' file.fastq > edited_file.fastq

And lieven's advice about leaving your original file unmodified is good advice, so redirect to a new file.

ADD COMMENT
0
Entering edit mode

I don't know if this is of any particular consequence for what you want to do, but you've missed an L out in ILLUMINA. You may also want to consider changing the substitution to:

/^@/@ILUMINA:/ since all the fields in the header lines are : delimited, and this might make it easier to separate out the string later on.

Use at own risk though, as messing with the FASTQ headers is liable to break other programs.

ADD REPLY
0
Entering edit mode

can you just explain the meaning of "1~4" to me ? thanks a lot

ADD REPLY
0
Entering edit mode

x~y is generic syntax for sed called an ‘address’ that basically says: starting on the 1st line, and every 4th thereafter, (~4), make the substitution defined in the /.../.../. This way it knows to ignore the quality line if it finds an @ at the start

ADD REPLY
0
Entering edit mode

you are such a nice person , thank you very much! i think i should buy a more advanced book rather than a basic book to study linux command. thanks a lot again!

ADD REPLY
2
Entering edit mode

You don’t even really need a book, all you need is Google, and:

a well formulated question.

For example, this question, once you really think about what needs to happen is you need to process all lines starting with “@“ right? Well, no, as Pierre and others mention, we can’t use @! - Oh no, we need to think about the problem another way.

What else do we know about FASTQ format? Well, every entry is always 4 lines (assuming the file isn’t malformed, but if it is you have other, bigger, problems). So, all we really need to do is “edit every nth line of a file (with sed)”. And this right here is your google search phrase.

The first result that search returns is:

https://superuser.com/questions/396536/how-to-keep-only-every-nth-line-of-a-file

Now the title of that thread might not seem immediately relevant, but it is. You’ve just found out the magic of how to edit every nth line, now you need only combine that with what you already know about how sed works (i.e. the substitution part) and you’re done!

ADD REPLY
0
Entering edit mode

ok , i got it :) i will try the way you recommend thanks a lot

ADD REPLY
0
Entering edit mode

always start with the basics ....there is a reason why they call it 'basic' ;) once you got the hang of that, you can move on to 'advanced' stuff

ADD REPLY
0
Entering edit mode

thanks , i will take it step by step

ADD REPLY
0
Entering edit mode

i will try it and thanks a lot :) maybe my question is really stupid , but i really suffered from it. Because i am new to Linux.

ADD REPLY
3
Entering edit mode
5.4 years ago
Malcolm.Cook ★ 1.5k

I understand "keep the change in my fastq file " to mean precisely the opposite of "Leaving your original file unmodified", to wit:

GNU sed provides the -i option to apply the edit in place

sed -i '1~4s/^@/@ILLUMINA/' test.fastq

Perl too, allowing

perl -p -i -e  's/^@/@ILLUMINA/ unless $i++%4 ' test.fastq

TIP:

Useful to know, but not needed here, is the sponge command from moreutils which can be used to perform in-place edits using any command even if it does not support -i for in-place edits. Example:

anyCommand test.fastq | sponge test.fastq

in which test.fastq won't be re-written unless anyCommand completes without error.


ADD COMMENT
0
Entering edit mode

That is what OP asked, but I specifically didn’t offer up the -i flag because I think OP should be told that it is a bad idea (generally).

ADD REPLY
0
Entering edit mode

thank you very much ,i will try it

ADD REPLY

Login before adding your answer.

Traffic: 1508 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6