Cut The Reads Of A Paired End Fastq File
4
3
Entering edit mode
10.4 years ago
Empyrean ▴ 160

Hi, I have the following 150bp reads. I would like to cut the bases which are more than 100bp. Also, I would like to cut from beginning of the read. Please let me know any script or program which can do this.

@HWI-DO4456:7:000000000-Z07CL:1:1:15052:1479 1:N:0:GGCTAC
TCCTCAGATTTTTTAGAAAGAGGAGTCTGCTTATAAGATAATGGCATCATTTTGATAGAATCTCCTCGCATTGTTGTAAAACTAATAACAAAGAAGGTTGGTTTTTGTGGTTTTGGTCTCCCGGCCTGAATCCAAGCTTGATGAATACGAA
+
@CCFFFFFHHHHHJJIJJJJJJIJJJJJJJJJIJJJIJJJIJJJJJJJJJJJJIJJJJJJIJGJJJJJJFHHFFFFFEEEEDEDDDDDDDCDDCBDD>@CB?BDDDDDDDCBDDDBBCDDDDDDDBDDDDDDDDDCBCDCBCDDDDEDDDD
@HWI-D04456:7:000000000-Z07CL:1:1:17590:1511 1:N:0:GGCTAC
TTAATTATACTTGTTGGTTTTGGTGGCGGATTAACATGGGGAGCAGTCGCTCTTCGTTGGGGTAAATAAGGACTGAGAGAAAAAAAGGAGTGTATTTTGTGAAGGTAGGGGCACAGTACCGTTGAAGCGTCTAATGAACGTGGAGGGATGG
+

illumina paired • 10k views
3
Entering edit mode

See Rule 4 in the document linked from the very top of the page. Or put the terms 'trim fastq' into any search engine. Answer is on the first page.

0
Entering edit mode

out of curiosity, why would you want to cut from the beginning of the reads? Aren't they supposed to be of high quality??

0
Entering edit mode

Several library construction methods involve the addition of linker/adapter sequences that are inside the sequencing adapters (e.g. RNA-seq libraries made using a Nugen Ovation cDNA synthesis kit). These adapter bases will be at the beginning of the read and without being trimmed may result in an alignable read (at least using common next-gen aligners that is)...

0
Entering edit mode
5
Entering edit mode
10.4 years ago

There are many tools that will help you trim reads in fastq format. FASTX seems to work nicely. For example, if you want to trim the first 5 bases and use the next 100 bases you could do something like:

FASTA/Q Trimmer

\$ fastx_trimmer -h
usage: fastx_trimmer [-h] [-f N] [-l N] [-z] [-v] [-i INFILE] [-o OUTFILE]

version 0.0.6
[-h]         = This helpful help screen.
[-f N]       = First base to keep. Default is 1 (=first base).
[-l N]       = Last base to keep. Default is entire read.
[-z]         = Compress output with GZIP.
[-i INFILE]  = FASTA/Q input file. default is STDIN.
[-o OUTFILE] = FASTA/Q output file. default is STDOUT.

2
Entering edit mode
10.4 years ago
lexnederbragt ★ 1.3k

With the fastq format, it is even possbile to use the unix cut command, but only if you want to keep the first X bases, and X is at least the length of the header:

cut -c 1-100 in.fq >out.fq


Might be faster than the other methods suggested, but I haven't tried...

0
Entering edit mode

cool idea. maybe one can make it work with no restrictions.

1
Entering edit mode
10.4 years ago

With Biopieces you can do:

read_fastq -i in.fq | extract_seq -b 10 -e 100 | write_fastq -o out.fq -x


Trimming sequence is covered here.

0
Entering edit mode
10.4 years ago
ALchEmiXt ★ 1.9k

Visit usegalaxy.org and all tools are there for you to use in one comprehensive framework (either on the public instance or after some configuration locally as well). It also allows to trim sequences based on quality values and such...