how to removed the first two nucleotides from a fastq file (single-end)
2
1
Entering edit mode
2.9 years ago
17318598206 ▴ 20

"CAGE (cap analysis of gene expression; Table S1) was as described (Yang et al., 2011) and sequenced using a HiSeq 2000 (100 nt reads). After removing adaptor sequences and checking read quality using Flexbar 2.2 with the parameters of “-at 3 -ao 10 --min-readlength 20 --max-uncalled 70 --phred-pre-trim 10”, we retained only reads beginning with NG or GG (the last two nucleotides on the 5′ adaptor). We then removed the first two nucleotides and mapped the sequences to the mouse genome using TopHat 2.0.4. " This is the way the literature works, how do I write code to remove the first two nucleotides

CGAE-seq fastq • 1.7k views
ADD COMMENT
2
Entering edit mode

multiple ways:

$ cutadapt -u 2 -o new.fastq input.fastq
$ seqkit subseq -r 3:-1 input.fastq -o new.fastq
$ sed -r '0~2 s/^.{2}//' input.fastq > new.fastq
$ awk '{print (NR%2 == 0 ? substr($0,3): $0)}' input.fastq > new.fastq
ADD REPLY
0
Entering edit mode

You can use "HEADCROP" option in "Trimmomatic"

ADD REPLY
2
Entering edit mode
2.9 years ago
5heikki 11k
awk '{if(NR%2){print $0}else{print substr($0,3)}}' in.fq > out.fq
ADD COMMENT
1
Entering edit mode
2.9 years ago
GenoMax 141k

code to remove the first two nucleotides

You can use bbduk.sh from BBMap suite like this:

bbduk.sh -Xmx2g in=your.fq out=trimmed.fq forcetrimleft=2

A guide to use BBDuk is available.

ADD COMMENT

Login before adding your answer.

Traffic: 3000 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6