Clustering ilumina reads with different lengths
0
0
Entering edit mode
4.1 years ago
usr2 ▴ 10

Hi,

I have a set of fastq reads that I would like to cluster, independent of read length.

Having the initial data:

AAAAAAAAAAAAAAAAAAAAAAAAA

AAAAAAAAAAAAAAA

AAAAAAA

BBBBBBBBBBBBBBBBBBBBBBBBB

BBBBBBBBBBBBB

BBBBBB

I would like the output of my data to be:

AAAAAAAAAAAAAAAAAAAAAAAAA

BBBBBBBBBBBBBBBBBBBBBBBBB

Do you know how would be the best way to implement it?

thanks

clustering sequencing • 422 views
ADD COMMENT
0
Entering edit mode

You can try clumpify.sh from BBMap suite with containment=t option. Read more about clumpify here: Introducing Clumpify: Create 30% Smaller, Faster Gzipped Fastq Files. And remove duplicates. There is also a guide available.

If this does not keep the longest representation of identical reads, you could filter your reads with reformat.sh or bbduk.sh (both from BBMap suite) with a minlength= option.

ADD REPLY

Login before adding your answer.

Traffic: 1951 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6