Question: Clumpify : issues with consensus mode (reverse complement of R1)
1
gravatar for Nicolas Rosewick
2.5 years ago by
Belgium, Brussels
Nicolas Rosewick8.5k wrote:

Hi,

I tried clumpify to extract a consensus sequence from a fastq file. To test clumpify I first tested it with a test fastq file containing two perfectly similar seqence :

R1:

@M00991:78:000000000-AP8FW:1:1109:25120:10273_CAGAAGTA/1
CTCTAGCGGCCAGGAGAGACCGGCAAACAATTGGGGGCTCGTCCGGGATTGATCACCCCGGAACCCTAACAATCCTCTGGACCCACCCCCTCGGCGGCGTTT
+
FFFFFFFGGGGGGGGGGGGHGGGGGGHHHHHHHHHGGGFHGGGGGGGGGHGHHHHHHHGGGGGGGGHHHHHHHHHHHHHHHHHGHGGGGGGGGGGGGGGGGG
@M00991:78:000000000-AP8FW:1:1107:22337:16138_CAGAAGTA/1
CTCTAGCGGCCAGGAGAGACCGGCAAACAATTGGGGGCTCGTCCGGGATTGATCACCCCGGAACCCTAACAATCCTCTGGACCCACCCCCTCGGCGGCGTTT
+
BFFFFFFGGGGGGGGGGGGHGGGGGGHHHHHHHHHGGGGHGGGGGGGGGHHHHHHHHHGGGGGGGGHHHHHEHHHHHHHHHHHGHGGGGGGGGGGGGGGGGG

R2:

@M00991:78:000000000-AP8FW:1:1109:25120:10273_CAGAAGTA/2
AAACGCCGCCGAGGGGGTGGGTCCAGAGGATTGTTAGGGTTCCGGGGTGATCAATCCCGGACGAGCCCCCAATTGTTTGCCGGTCTCTCCTGGCCGCTAGAG
+
HHHHGGGGGGGGGGGGGFGGGGGHHHHHGHHHHHHHHHHHGHHGGGGEEHHHHHHHFHGGGGGGGGGGGGGGHHHFHHHHHGGGGGGHHHHHHHHGGGGGHH
@M00991:78:000000000-AP8FW:1:1107:22337:16138_CAGAAGTA/2
AAACGCCGCCGAGGGGGTGGGTCCAGAGGATTGTTAGGGTTCCGGGGTGATCAATCCCGGACGAGCCCCCAATTGTTTGCCGGTCTCTCCTGGCCGCTAGAG
+
HHHHGGGGGGGGGGGGGEGGGGHHHHHHGHHHHHHHHHHHHHHGGGGEGFHHHHGHHHGGGGGGGGGGGGGGHHHHHHHHHGGGGGHHGHHHHHHGGGCGHH

Then I perform clumpify on it :

clumpify.sh qin=33 in=R1.fastq in2=R2.fastq out=R1.dedup.fastq out2=R2.dedup.fastq dedupe=t dupesubs=0 consensus=t

Here is the results:

R1:
@M00991:78:000000000-AP8FW:1:1109:25120:10273_CAGAAGTA/1
AAACGCCGCCGAGGGGGTGGGTCCAGAGGATTGTTAGGGTTCCGGGGTGATCAATCCCGGACGAGCCCCCAATTGTTTGCCGGTCTCTCCTGGCCGCTAGAG
+
GGGGGGGGGGGGGGGGGHGHHHHHHHHHHHHHHHHHGGGGGGGGHHHHHHHGHGGGGGGGGGHFGGGHHHHHHHHHGGGGGGHGGGGGGGGGGGGFFFFFFF

R2:
@M00991:78:000000000-AP8FW:1:1109:25120:10273_CAGAAGTA/2
AAACGCCGCCGAGGGGGTGGGTCCAGAGGATTGTTAGGGTTCCGGGGTGATCAATCCCGGACGAGCCCCCAATTGTTTGCCGGTCTCTCCTGGCCGCTAGAG
+
HHHHGGGGGGGGGGGGGFGGGGGHHHHHGHHHHHHHHHHHGHHGGGGEEHHHHHHHFHGGGGGGGGGGGGGGHHHFHHHHHGGGGGGHHHHHHHHGGGGGHH

So R2 is ok but R1 seems to be reverse complemented. Why ? Is there a paramter that I miss ?

Thanks

clumpify consensus • 729 views
ADD COMMENTlink modified 2.5 years ago by Brian Bushnell17k • written 2.5 years ago by Nicolas Rosewick8.5k
1

You seem to have found a bug. Irrespective of what rcomp= is set to (t/f) the final result seems to be the same as yours. I tried using separate R1/R2 files and interleaving the reads. It only seems to be happening in consensus=t mode.

ADD REPLYlink modified 2.5 years ago • written 2.5 years ago by genomax75k

Tagging: Brian Bushnell

ADD REPLYlink written 2.5 years ago by genomax75k
2
gravatar for Brian Bushnell
2.5 years ago by
Walnut Creek, USA
Brian Bushnell17k wrote:

Clumpify can only currently produce consensus of clumps, not of sets of duplicate reads or read pairs. Thus, it is currently impossible to do what many people want to do, which is "deduplicate my reads, but instead of keeping the best representative pair of each set of duplicates, create a consensus from each set of duplicates".

The consensus operation was written prior to deduplication, and for a different goal, which was genome assembly - once reads are formed into clumps (assuming unpaired reads), each clump is flattened into a single consensus sequence that spans multiple overlapping reads (and thus is usually longer than any single read). Adding the "consensus" flag automatically sets rcomp to true to minimize the number of clumps (which fit my original goal, but I will examine changing that). After the consensus operation the original orientation is lost because presumably multiple reads of different orientations went into the clump.

So, unfortunately, you can't currently use Clumpify the way you want to use it. I may put something in to catch the combination of consensus+dedupe or consensus+paired reads and exit with a warning, because those are not really supported right now. But I do plan to add the ability to produce consensus output from sets of duplicate reads at some point.

ADD COMMENTlink written 2.5 years ago by Brian Bushnell17k

Thanks Brian for the answer. I'll try to write my own function. It would not be to complicated as my group of reads to "consensus" have the same length and should be very similar (max 3 mismatch). I'll post my answer when it's ready.

Maybe you should specify this feature (consensus of clumps and not reads) in clumpify's help. BTW nice tool ;)

ADD REPLYlink modified 2.5 years ago • written 2.5 years ago by Nicolas Rosewick8.5k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1875 users visited in the last hour