I can't find an easy solution to extract the last N bases of all reads from a FASTQ file, is there an easy solution with prinseq-lite, cutadapt or Fastx-toolkit to do it ? Or an other tool. I know i can do it on my own with either a java or a perl script, but well it seems so obvious that one of those program can do it in a few seconds, but i do not find the way to do it.
thanks a lot to save me time, and thanks for this so resourcefull forum.
The definition of region is 1-based and with some custom design.
1-based index 1 2 3 4 5 6 7 8 9 10
negative index 0-9-8-7-6-5-4-3-2-1
seq A C G T N a c g t n
2:4 C G T
-4:-2 c g t
-4:-1 c g t n
2:-2 C G T N a c g t
1:-1 A C G T N a c g t n
1:12 A C G T N a c g t n
-12:-1 A C G T N a c g t n
You mean remove them?
Do you want to remove last N bases from a fastq file or want to extract them into new file?
Answers below cover both possibilities.
I was puzzled by the same thing.
Hi all, thanks for the answers but it seems not working as i want. First all the sed command does not output fastq file so it is useless for me. Concerning the bbduk options, i did not try bbduk yet but it really seems interesting and maybe it can do the stuff, but this command line does not work. Let me explain better what i want, here is the first reads of my input.fastq file:
What i want as output is the last 4 bases in a fastq format file:
NB: cat input.fastq | /share/apps/local/bbmap/bbduk.sh in=stdin.fastq forcetrimright=3 out=output.fq minlength=3
--> output.fastq contains the first 4 bases not the last ones
Note that reads can have differents length, what i want is really the last 4 bases of all reads in my fastq file in a fastq format (including QV) thanks for helping me ! kevin.
If future use the "Add comment"/"Add reply" options against the respective answers/comments to provide additional information. Do not add a "new answer".
As for the BBMap solution I suggest that you use reformat.sh like so
[seq_length - bases you want to keep]this part with a real number.
Edit: I see that reads can have different length so this solution would not work.
It would help if the original question gave the exact details of what you want. The way it is worded makes it seem like all you wanted was the last four bases of each read simply printed out. Truncating fastq entries down to the last 4bp is a different problem.