Is There A Fastq Alternative To Fastx_Collapser (Outputs Fasta)?
1
5
Entering edit mode
12.2 years ago
Yannick Wurm ★ 2.4k

fastx_collapser seems to convert my fastq files to fasta. That's not cool.

cat a
@HWI-ST132_0395:8:1:1177:1888#ATCTNC/1
ATACATATATCAGCATAAAGGTGTTCACAGGTCATCATGAGGGATCAGTTTGTAGCAATTACGGAGGTCACGAGATCGGACGAGCGGTTGCGCA
+HWI-ST132_0395:8:1:1177:1888#ATCTNC/1
d^`d`dddeccce\eedddac^JW\`X````Z`L``L]\\TYHNVZQ`__L\P_^a_^\^BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
@HWI-ST132_0395:8:1:1048:1897#ATNTNN/1
GTGGATTCCGGGGGAATGGGGAGCGGGACGATGTGAAAGGAGCGGGAAGGGGGCGGAAGCGCGGCACAGTCGGCAGGCAGAGTTGCTAGAACAG
+HWI-ST132_0395:8:1:1048:1897#ATNTNN/1
c`cacTccbcccYbU^YM^\L^\\Z^\P]]YLUJ]VOaQ_U]^aBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB


fastx_collapser -i a
>1-1
GTGGATTCCGGGGGAATGGGGAGCGGGACGATGTGAAAGGAGCGGGAAGGGGGCGGAAGCGCGGCACAGTCGGCAGGCAGAGTTGCTAGAACAG
>2-1
ATACATATATCAGCATAAAGGTGTTCACAGGTCATCATGAGGGATCAGTTTGTAGCAATTACGGAGGTCACGAGATCGGACGAGCGGTTGCGCA

Is there an alternative collapser?

next-gen sequencing short • 7.4k views
ADD COMMENT
1
Entering edit mode

What quality score would you want to see in cases with multiple identical sequences? That's probably the hard problem Assaf was trying to avoid by outputting as FASTA.

ADD REPLY
0
Entering edit mode

yes, that's a pain. i just write a quick throw-away script to do it. you could try emailing the author of fastx toolkit and asking. but i'd like to see a solution with awk/sed. :-)

ADD REPLY
0
Entering edit mode

I'd be happy with anything :) Be it a random choice, or the quality scores of the sequence with the highest overall quality...

ADD REPLY
0
Entering edit mode

Why do you need the output to be fastq? I'd be wary of using random (or at least not entirely correct) quality scores in downstream processing... If you're planning on aligning next, I think most aligners with take fasta input (I know bowtie and novoalign do).

ADD REPLY
6
Entering edit mode
12.2 years ago
brentp 24k

Since this doesn't have an answer yet. check reads-utils

which when run as:

./fastq filter --adjust 64 --unique /path/to/your.fasta > unique.fasta

will keep the records with the highest average quality.

ADD COMMENT
1
Entering edit mode

This tool is great and real quick. I've looked at the code but couldn't find a good way to get also the name of the read with the highest average quality to be printed in the output (my C knowledge is fairly rusty). What I'm trying to do is to is to unique paired end reads so I need to know where one read ends and another starts to be able to separate them and use them for downstream analyses. Any ideas ? Thanks.

ADD REPLY
0
Entering edit mode

Win - its very fast too! I wish more things were written in C! thanks

ADD REPLY
0
Entering edit mode

(although for "unique" one adjust should be unnecessary and has no effect) :)

ADD REPLY
0
Entering edit mode

I guess it cannot collapse paired-end reads, can it?

ADD REPLY

Login before adding your answer.

Traffic: 1602 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6