Question: How can keep the change in my fastq file when i use grep and sed to edit it?
0
gravatar for Zeason
28 days ago by
Zeason0
Zeason0 wrote:

I just want to do some modification to my read id in my fastq file. And i use grep to get the id i want to edit , then i use sed to make the change . But i find there is no change in my original fastq file . here is my command:

cat test.fastq |grep '^@.*/1'| sed 's/@/@ILUMINA/g'

how can i solve it , thanks a lot !!!

ADD COMMENTlink modified 27 days ago by Malcolm.Cook900 • written 28 days ago by Zeason0
1

never use '@' as a signal that the line is the header, because '@' is also a valid character for the fastq quality.

ADD REPLYlink written 28 days ago by Pierre Lindenbaum115k

simply run the sed command on your original file to modify it, omitting the grep part

Keep in mind though that the original will then not be present anymore (as you will have changed it), a better approach might be to redirect it to a new file

cat test.fastq | sed 's/@/@ILUMINA/g' > some-new_file

this might not be restrictive enough though, as it will also change all other occurrences of '@'

ADD REPLYlink modified 28 days ago • written 28 days ago by lieven.sterck3.3k

re: "simply run the sed command" - note: you must pass -i to modify it in place (assuming GNU sed)

ADD REPLYlink modified 27 days ago • written 27 days ago by Malcolm.Cook900
3

That's not really something I would advise to novice users. Great way to lose your input data.

ADD REPLYlink written 27 days ago by WouterDeCoster35k

Agreed. Don't use the -i switch unless you're really sure what the sed does and you're sure you don't need the unmodified content later.

ADD REPLYlink written 27 days ago by RamRS19k

i just want change the each id of my reads . i think the way you recommend will change the quality also. the "grep '^@.*/1'" in my command just restrict the row i want to change to the id line in my fastq file. anyway ,thanks a lot

ADD REPLYlink written 27 days ago by Zeason0

Your grep command wouldn’t have solved that issue anyway, as it would still match a quality line that begins with @

ADD REPLYlink written 27 days ago by jrj.healey9.2k

Out of curiosity: why do you want to add "ILLUMINA" to every header?

ADD REPLYlink written 27 days ago by h.mon22k

just a example , i just want to prefix the id. because the stupid sequencing company give me the pair-end fastq file whose id like this : @307/1 it cant support me to do markduplicate in GATK that really make me mad :(

ADD REPLYlink written 27 days ago by Zeason0

And adding "ILLUMINA" to the headers will make markduplicate work? Are you referring to Picard MarkDuplicates? I thought it was supposed to work on bam files, not on fastq files.

Did you ask the sequencing company why the headers are like this? Illumina headers follow a different naming convention.

ADD REPLYlink modified 27 days ago • written 27 days ago by h.mon22k

beacause picard just told me "Value was put into PairInfoMap more than once", and when i find solution on the net , i just find someone said this error results from some lane id in the fastq file is repeat. so i just want to edit the id of reads to solve it . this way really solve the problem at least now. maybe the way you told me works well ,but i dont how to do it. :(

ADD REPLYlink modified 27 days ago • written 27 days ago by Zeason0

If an answer was helpful you should upvote it, if the answer resolved your question you should mark it as accepted.
Upvote|Bookmark|Accept

ADD REPLYlink written 27 days ago by WouterDeCoster35k
4
gravatar for jrj.healey
28 days ago by
jrj.healey9.2k
United Kingdom
jrj.healey9.2k wrote:

To avoid issues with @ in the quality line as Pierre points out:

 sed '1~4s/^@/@ILUMINA/' file.fastq > edited_file.fastq

And lieven's advice about leaving your original file unmodified is good advice, so redirect to a new file.

ADD COMMENTlink modified 28 days ago • written 28 days ago by jrj.healey9.2k

I don't know if this is of any particular consequence for what you want to do, but you've missed an L out in ILLUMINA. You may also want to consider changing the substitution to:

/^@/@ILUMINA:/ since all the fields in the header lines are : delimited, and this might make it easier to separate out the string later on.

Use at own risk though, as messing with the FASTQ headers is liable to break other programs.

ADD REPLYlink written 28 days ago by jrj.healey9.2k

can you just explain the meaning of "1~4" to me ? thanks a lot

ADD REPLYlink written 27 days ago by Zeason0

x~y is generic syntax for sed called an ‘address’ that basically says: starting on the 1st line, and every 4th thereafter, (~4), make the substitution defined in the /.../.../. This way it knows to ignore the quality line if it finds an @ at the start

ADD REPLYlink written 27 days ago by jrj.healey9.2k

you are such a nice person , thank you very much! i think i should buy a more advanced book rather than a basic book to study linux command. thanks a lot again!

ADD REPLYlink written 27 days ago by Zeason0
2

You don’t even really need a book, all you need is Google, and:

a well formulated question.

For example, this question, once you really think about what needs to happen is you need to process all lines starting with “@“ right? Well, no, as Pierre and others mention, we can’t use @! - Oh no, we need to think about the problem another way.

What else do we know about FASTQ format? Well, every entry is always 4 lines (assuming the file isn’t malformed, but if it is you have other, bigger, problems). So, all we really need to do is “edit every nth line of a file (with sed)”. And this right here is your google search phrase.

The first result that search returns is:

https://superuser.com/questions/396536/how-to-keep-only-every-nth-line-of-a-file

Now the title of that thread might not seem immediately relevant, but it is. You’ve just found out the magic of how to edit every nth line, now you need only combine that with what you already know about how sed works (i.e. the substitution part) and you’re done!

ADD REPLYlink modified 27 days ago • written 27 days ago by jrj.healey9.2k

ok , i got it :) i will try the way you recommend thanks a lot

ADD REPLYlink written 27 days ago by Zeason0

always start with the basics ....there is a reason why they call it 'basic' ;) once you got the hang of that, you can move on to 'advanced' stuff

ADD REPLYlink written 27 days ago by lieven.sterck3.3k

thanks , i will take it step by step

ADD REPLYlink written 27 days ago by Zeason0

i will try it and thanks a lot :) maybe my question is really stupid , but i really suffered from it. Because i am new to Linux.

ADD REPLYlink written 27 days ago by Zeason0
3
gravatar for Malcolm.Cook
27 days ago by
Malcolm.Cook900
kansas, usa
Malcolm.Cook900 wrote:

I understand "keep the change in my fastq file " to mean precisely the opposite of "Leaving your original file unmodified", to wit:

GNU sed provides the -i option to apply the edit in place

sed -i '1~4s/^@/@ILLUMINA/' test.fastq

Perl too, allowing

perl -p -i -e  's/^@/@ILLUMINA/ unless $i++%4 ' test.fastq

TIP:

Useful to know, but not needed here, is the sponge command from moreutils which can be used to perform in-place edits using any command even if it does not support -i for in-place edits. Example:

anyCommand test.fastq | sponge test.fastq

in which test.fastq won't be re-written unless anyCommand completes without error.


ADD COMMENTlink modified 27 days ago • written 27 days ago by Malcolm.Cook900

That is what OP asked, but I specifically didn’t offer up the -i flag because I think OP should be told that it is a bad idea (generally).

ADD REPLYlink written 27 days ago by jrj.healey9.2k

thank you very much ,i will try it

ADD REPLYlink written 27 days ago by Zeason0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1420 users visited in the last hour