Question: trimming reads in fastq file
0
gravatar for alirezamomeni707
2.1 years ago by
alirezamomeni7070 wrote:

I have a fastq file and at the beginning of all reads I have a "N". how can I get ride of that N using command line? here is an example:

@SRR2163140.1 HISEQ:148:C670LANXX:3:1101:1302:1947 length=50
NGCGACCTCAGATCAGACGTGGCGACCTGGAATTCTCGGGTGCCAAGGAA
+SRR2163140.1 HISEQ:148:C670LANXX:3:1101:1302:1947 length=50
#<<ABGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGFGGGGGGGGGGG
@SRR2163140.2 HISEQ:148:C670LANXX:3:1101:1440:1963 length=50
NAGGCCTGGAATTCTCGGGTGCCAAGGAACTCCAGTCACATCACGATCTC
+SRR2163140.2 HISEQ:148:C670LANXX:3:1101:1440:1963 length=50
#=<BBGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
@SRR2163140.3 HISEQ:148:C670LANXX:3:1101:1381:1997 length=50
NGCCGACATCGAAGGATCAATGGAATTCTCGGGTGCCAAGGAACTCCAGT
+SRR2163140.3 HISEQ:148:C670LANXX:3:1101:1381:1997 length=50
#<<ABFGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGFF
@SRR2163140.4 HISEQ:148:C670LANXX:3:1101:1705:1940 length=50
NACAAACCCTTGTGTCGAGGGCTGGAATTCTCGGGTGCCAAGGAACTCCA
rna-seq • 1.3k views
ADD COMMENTlink modified 2.1 years ago by Charles Plessy2.7k • written 2.1 years ago by alirezamomeni7070

you should be doing some QC on the file anyway so just run it through FASTQC and trimgalore with default settings and this will happen automatically (I think trim galore removes the first 3 nucleotides by default for each read)

ADD REPLYlink written 2.1 years ago by YaGalbi1.4k
1
gravatar for Buffo
2.1 years ago by
Buffo1.6k
Buffo1.6k wrote:

Triming them with prinseq-lite; you can trim by 5, 3, max N number etc. What is that? miRNA-seq?

http://prinseq.sourceforge.net/manual.html
ADD COMMENTlink written 2.1 years ago by Buffo1.6k
1
gravatar for Macspider
2.1 years ago by
Macspider2.8k
Vienna - BOKU
Macspider2.8k wrote:

Trimmomatic with HEADCROP:1

http://www.usadellab.org/cms/uploads/supplementary/Trimmomatic/TrimmomaticManual_V0.32.pdf

ADD COMMENTlink written 2.1 years ago by Macspider2.8k
0
gravatar for Charles Plessy
2.1 years ago by
Charles Plessy2.7k
Japan
Charles Plessy2.7k wrote:

You can use EMBOSS to trim the first base of sequences in many formats, including FASTQ. In the example below, I saved your sequenced in a file names toto.fq. As you can see, EMBOSS discards the sequence name on the "+" lines, which makes the file quite lighter.

$ seqret fastq-sanger::toto.fq[2:] fastq-sanger::stdout
Read and write (return) sequences
@SRR2163140.1 HISEQ:148:C670LANXX:3:1101:1302:1947 length=50
GCGACCTCAGATCAGACGTGGCGACCTGGAATTCTCGGGTGCCAAGGAA
+
<<ABGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGFGGGGGGGGGGG
@SRR2163140.2 HISEQ:148:C670LANXX:3:1101:1440:1963 length=50
AGGCCTGGAATTCTCGGGTGCCAAGGAACTCCAGTCACATCACGATCTC
+
=<BBGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
@SRR2163140.3 HISEQ:148:C670LANXX:3:1101:1381:1997 length=50
GCCGACATCGAAGGATCAATGGAATTCTCGGGTGCCAAGGAACTCCAGT
+
<<ABFGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGFF
ADD COMMENTlink modified 2.1 years ago • written 2.1 years ago by Charles Plessy2.7k

That tool seems nice !!

Charles, i have a slightly related question, regarding triming of first base. I was looking at modEncode CAGE data, and i see there is a very high percentage of first base added on Fastq sequence, but not on all sequence. First base is generallt "G" as it is known. I mapped fastq files and i see that TSS is shifted by 1 base. I tried local Vs endToend mapping of bowtie, yet the persist of TSS shifting. Mismatch on first base gives wrong TSS What do you think is the best way to map these reads accurately.

ADD REPLYlink written 2.1 years ago by Chirag Nepal2.2k

(I just answered in the post that you linked)

ADD REPLYlink modified 2.1 years ago • written 2.1 years ago by Charles Plessy2.7k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1219 users visited in the last hour