Is There A Fastq Alternative To Fastx_Collapser (Outputs Fasta)?
1
5
Entering edit mode
12.2 years ago
Yannick Wurm ★ 2.4k

fastx_collapser seems to convert my fastq files to fasta. That's not cool.

cat a
@HWI-ST132_0395:8:1:1177:1888#ATCTNC/1
ATACATATATCAGCATAAAGGTGTTCACAGGTCATCATGAGGGATCAGTTTGTAGCAATTACGGAGGTCACGAGATCGGACGAGCGGTTGCGCA
+HWI-ST132_0395:8:1:1177:1888#ATCTNC/1
d^ddddeccce\eedddac^JW\XZLL]\\TYHNVZQ__L\P_^a_^\^BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
@HWI-ST132_0395:8:1:1048:1897#ATNTNN/1
GTGGATTCCGGGGGAATGGGGAGCGGGACGATGTGAAAGGAGCGGGAAGGGGGCGGAAGCGCGGCACAGTCGGCAGGCAGAGTTGCTAGAACAG
+HWI-ST132_0395:8:1:1048:1897#ATNTNN/1
ccacTccbcccYbU^YM^\L^\\Z^\P]]YLUJ]VOaQ_U]^aBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB

fastx_collapser -i a
>1-1
GTGGATTCCGGGGGAATGGGGAGCGGGACGATGTGAAAGGAGCGGGAAGGGGGCGGAAGCGCGGCACAGTCGGCAGGCAGAGTTGCTAGAACAG
>2-1
ATACATATATCAGCATAAAGGTGTTCACAGGTCATCATGAGGGATCAGTTTGTAGCAATTACGGAGGTCACGAGATCGGACGAGCGGTTGCGCA


Is there an alternative collapser?

next-gen sequencing short • 7.4k views
1
Entering edit mode

What quality score would you want to see in cases with multiple identical sequences? That's probably the hard problem Assaf was trying to avoid by outputting as FASTA.

0
Entering edit mode

yes, that's a pain. i just write a quick throw-away script to do it. you could try emailing the author of fastx toolkit and asking. but i'd like to see a solution with awk/sed. :-)

0
Entering edit mode

I'd be happy with anything :) Be it a random choice, or the quality scores of the sequence with the highest overall quality...

0
Entering edit mode

Why do you need the output to be fastq? I'd be wary of using random (or at least not entirely correct) quality scores in downstream processing... If you're planning on aligning next, I think most aligners with take fasta input (I know bowtie and novoalign do).

6
Entering edit mode
12.2 years ago
brentp 24k

which when run as:

./fastq filter --adjust 64 --unique /path/to/your.fasta > unique.fasta


will keep the records with the highest average quality.

1
Entering edit mode

This tool is great and real quick. I've looked at the code but couldn't find a good way to get also the name of the read with the highest average quality to be printed in the output (my C knowledge is fairly rusty). What I'm trying to do is to is to unique paired end reads so I need to know where one read ends and another starts to be able to separate them and use them for downstream analyses. Any ideas ? Thanks.

0
Entering edit mode

Win - its very fast too! I wish more things were written in C! thanks

0
Entering edit mode

(although for "unique" one adjust should be unnecessary and has no effect) :)

0
Entering edit mode

I guess it cannot collapse paired-end reads, can it?

Traffic: 1602 users visited in the last hour
FAQ
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.