trim leading T or A from fastq file
1
0
Entering edit mode
9.6 years ago

Hi, I am working with RNAseq data. I want to trim the leading T or A from my fastq file then perform mapping. Does anyone know how to do that? Thanks a lot!

Meisheng

RNA-Seq • 2.1k views
ADD COMMENT
0
Entering edit mode
9.6 years ago

The following awk script should do the job (look at the length of the AT prefix in the 2nd line)

awk '{if(NR%4==2) {L=0;if(match($0,/^[ATat]+/)>0) { L=RLENGTH;} $0=substr($0,L+1);} else if(NR%4==0) {$0=substr($0,L+1);} print;}' in.fastq > out.fastq
ADD COMMENT
0
Entering edit mode

Hi Pierre, Thanks very much for your reply! I have tried the code you supplied, however it seems not workable. The following is the content of the error.

awk: {if(NR%4==2) {L=0;if(match(-bash,/^[Tt]+/)>0) { L=RLENGTH;} -bash=substr(-bash,L+1);} else if(NR%4==0) {-bash=substr(-bash,L+1);} print;}
awk:                                                                  ^ syntax error
awk: {if(NR%4==2) {L=0;if(match(-bash,/^[Tt]+/)>0) { L=RLENGTH;} -bash=substr(-bash,L+1);} else if(NR%4==0) {-bash=substr(-bash,L+1);} print;}
awk:                                                                                                              ^ syntax error

Do you have any idea about this? Thanks!

Meisheng

ADD REPLY
1
Entering edit mode

make sure to copy/paste the code correctly, it does work

ADD REPLY

Login before adding your answer.

Traffic: 1427 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6