Question

trimming reads in fastq file

0

Entering edit mode

7.0 years ago

alirezamomeni707 • 0

I have a fastq file and at the beginning of all reads I have a "N". how can I get ride of that N using command line? here is an example:

@SRR2163140.1 HISEQ:148:C670LANXX:3:1101:1302:1947 length=50
NGCGACCTCAGATCAGACGTGGCGACCTGGAATTCTCGGGTGCCAAGGAA
+SRR2163140.1 HISEQ:148:C670LANXX:3:1101:1302:1947 length=50
#<<ABGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGFGGGGGGGGGGG
@SRR2163140.2 HISEQ:148:C670LANXX:3:1101:1440:1963 length=50
NAGGCCTGGAATTCTCGGGTGCCAAGGAACTCCAGTCACATCACGATCTC
+SRR2163140.2 HISEQ:148:C670LANXX:3:1101:1440:1963 length=50
#=<BBGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
@SRR2163140.3 HISEQ:148:C670LANXX:3:1101:1381:1997 length=50
NGCCGACATCGAAGGATCAATGGAATTCTCGGGTGCCAAGGAACTCCAGT
+SRR2163140.3 HISEQ:148:C670LANXX:3:1101:1381:1997 length=50
#<<ABFGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGFF
@SRR2163140.4 HISEQ:148:C670LANXX:3:1101:1705:1940 length=50
NACAAACCCTTGTGTCGAGGGCTGGAATTCTCGGGTGCCAAGGAACTCCA

RNA-Seq • 3.9k views

ADD COMMENT • link updated 7.0 years ago by Charles Plessy ★ 2.9k • written 7.0 years ago by alirezamomeni707 • 0

0

Entering edit mode

you should be doing some QC on the file anyway so just run it through FASTQC and trimgalore with default settings and this will happen automatically (I think trim galore removes the first 3 nucleotides by default for each read)

ADD REPLY • link 7.0 years ago by BioinfGuru ★ 1.7k

score 1 · Answer 1 · 2017-04-20

1

Entering edit mode

7.0 years ago

Buffo ★ 2.4k

Triming them with prinseq-lite; you can trim by 5, 3, max N number etc. What is that? miRNA-seq?

http://prinseq.sourceforge.net/manual.html

ADD COMMENT • link 7.0 years ago by Buffo ★ 2.4k

score 1 · Answer 2 · 2017-04-20

1

Entering edit mode

7.0 years ago

Matteo Schiavinato ★ 3.6k

Trimmomatic with HEADCROP:1

http://www.usadellab.org/cms/uploads/supplementary/Trimmomatic/TrimmomaticManual_V0.32.pdf

ADD COMMENT • link 7.0 years ago by Matteo Schiavinato ★ 3.6k

score 0 · Answer 3 · 2017-04-20

0

Entering edit mode

7.0 years ago

Charles Plessy ★ 2.9k

You can use EMBOSS to trim the first base of sequences in many formats, including FASTQ. In the example below, I saved your sequenced in a file names toto.fq. As you can see, EMBOSS discards the sequence name on the "+" lines, which makes the file quite lighter.

$ seqret fastq-sanger::toto.fq[2:] fastq-sanger::stdout
Read and write (return) sequences
@SRR2163140.1 HISEQ:148:C670LANXX:3:1101:1302:1947 length=50
GCGACCTCAGATCAGACGTGGCGACCTGGAATTCTCGGGTGCCAAGGAA
+
<<ABGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGFGGGGGGGGGGG
@SRR2163140.2 HISEQ:148:C670LANXX:3:1101:1440:1963 length=50
AGGCCTGGAATTCTCGGGTGCCAAGGAACTCCAGTCACATCACGATCTC
+
=<BBGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
@SRR2163140.3 HISEQ:148:C670LANXX:3:1101:1381:1997 length=50
GCCGACATCGAAGGATCAATGGAATTCTCGGGTGCCAAGGAACTCCAGT
+
<<ABFGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGFF

ADD COMMENT • link 7.0 years ago by Charles Plessy ★ 2.9k

0

Entering edit mode

That tool seems nice !!

Charles, i have a slightly related question, regarding triming of first base. I was looking at modEncode CAGE data, and i see there is a very high percentage of first base added on Fastq sequence, but not on all sequence. First base is generallt "G" as it is known. I mapped fastq files and i see that TSS is shifted by 1 base. I tried local Vs endToend mapping of bowtie, yet the persist of TSS shifting. Mismatch on first base gives wrong TSS What do you think is the best way to map these reads accurately.