Problem with FASTA headers and Trinity
1
0
Entering edit mode
4.3 years ago
User 4014 ▴ 40

Hi all,

I have a problem with header formats. Since Trinity needs all of the headers to be stitched together, I have some with whitespaces in - that look like this:

@A00700:80:HHHNGDRXX:1:2101:30481:11663#GTGCACCAGGAATCAC 0:N: 00 /1

Can anyone help me with some tips on how to fix these with a bash script or something?

Thank you very much in advance.

rna-seq RNA-Seq sequence • 1.1k views
ADD COMMENT
0
Entering edit mode

That looks like a hybrid of old and new Illumina fastq headers. How did you end up with those?

ADD REPLY
0
Entering edit mode

Actually it is from NovaSeq. The original headers end at #GTGCACCAGGAATCAC, but I guess the 0:N: 00 is added by STAR and /1 is by reformat.sh.

ADD REPLY
0
Entering edit mode

Neither makes sense. Can you tell us what you used reformat.sh for?

ADD REPLY
0
Entering edit mode

Yes, I used it to add /1 and /2 flags. I have mixed RNA-Seq data, but somehow I managed to remove the flags during rRNA clean-up (with both bowtie2 and bbduk) and binning to separate mixed reads from a plant and a fungus (with STAR). Do you have a suggestion?

ADD REPLY
0
Entering edit mode
ADD COMMENT
0
Entering edit mode

Thanks for the link!

ADD REPLY

Login before adding your answer.

Traffic: 1972 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6