EDIT : Just saw that --read-length parameter answer my question ! My bad...
Hi,
I run UMI-tools (https://github.com/CGATOxford/UMI-tools) on my dataset using this command :
python $umitools/group.py -I sorted.bam --paired --edit-distance-threshold=1 --group-out=groups.tsv -L stats.txt
After that I've a group.tsv file containing the groups for my reads. Here's a subset of my group.tsv :
read_id contig position umi umi_count final_umi final_umi_count unique_id
NS500186:291:HTYKYAFXX:3:11511:4602:8498_CTCTTCCGATCT chr1 504 CTCTTCCGATCT 1 CTCTTCCGATCT 1 0
NS500186:291:HTYKYAFXX:3:21407:18031:16780_CTCTTCCGATCT chr1 504 CTCTTCCGATCT 1 CTCTTCCGATCT 1 1
NS500186:291:HTYKYAFXX:1:11103:22901:14851_CTCTTCCGATCT chr1 504 CTCTTCCGATCT 24 CTCTTCCGATCT 24 2
NS500186:291:HTYKYAFXX:1:11104:5947:6879_CTCTTCCGATCT chr1 504 CTCTTCCGATCT 24 CTCTTCCGATCT 24 2
NS500186:291:HTYKYAFXX:1:11105:22350:7050_CTCTTCCGATCT chr1 504 CTCTTCCGATCT 24 CTCTTCCGATCT 24 2
NS500186:291:HTYKYAFXX:1:11207:26170:12155_CTCTTCCGATCT chr1 504 CTCTTCCGATCT 24 CTCTTCCGATCT 24 2
NS500186:291:HTYKYAFXX:1:11209:5982:16287_CTCTTCCGATCT chr1 504 CTCTTCCGATCT 24 CTCTTCCGATCT 24 2
NS500186:291:HTYKYAFXX:1:11304:14850:18144_CTCTTCCGATCT chr1 504 CTCTTCCGATCT 24 CTCTTCCGATCT 24 2
NS500186:291:HTYKYAFXX:1:11307:16408:8652_CTCTTCCGATCT chr1 504 CTCTTCCGATCT 24 CTCTTCCGATCT 24 2
So each read harboring the same alignment position and the same umi (with max 1bp distance) are assigned to the same group.
But looking at the reads I've a lot of cases where the read sequence is not the same (in size). Alignment starts is ok but alignment end is not the same because the input read has a different length.
Example, let's take the umi group 13 containing 8 reads :
read_seq position umi umi_count final_umi final_umi_count unique_id
CTAGCGGCCAGGAGAGACCGCAGGCAGACCGCTTCCCTCCAGGAAGAGCGCCAGTTTTCTAG 8693 CTCTTCCTGGAG 8 CTCTTCCTGGAG 8 13
CTAGCGGCCAGGAGAGACCGCAGGCAGACCGCTTCCCTCCAGGAAGAGCGCCAGTTTTCTAG 8693 CTCTTCCTGGAG 8 CTCTTCCTGGAG 8 13
CTAGCGGCCAGGAGAGACCGCAGGCAGACCGCTTCCCTCCAGGAAGAGCGCCAGTTTTCTAGCTGGCTCTTATAC 8693 CTCTTCCTGGAG 8 CTCTTCCTGGAG 8 13
CTAGCGGCCAGGAGAGACCGCAGGCAGACCGCTTCCCTCCAGGAAGAGCGCCAGTTTTCTAG 8693 CTCTTCCTGGAG 8 CTCTTCCTGGAG 8 13
CTAGCGGCCAGGAGAGACCGCAGGCAGACCGCTTCCCTCCAGGAAGAGCGCCAGTTTTCTAG 8693 CTCTTCCTGGAG 8 CTCTTCCTGGAG 8 13
CTAGCGGCCAGGAGAGACCGCAGGCAGACCGCTTCCCTCCAGGAAGAGCGCCAGTTTAATAG 8693 CTCTTCCTGGAG 8 CTCTTCCTGGAG 8 13
CTAGCGGCCAGGAGAGACCGCAGGCAGACCGCTTCCCTCCAGGAAGAGCGCCAGTTTTCTAG 8693 CTCTTCCTGGAG 8 CTCTTCCTGGAG 8 13
CTAGCGGCCAGGAGAGACCGCAGGCAGACCGCTTCCCTCCAGGAAGAGCGCCAGTTTTCTAG 8693 CTCTTCCTGGAG 8 CTCTTCCTGGAG 8 13
So same alignment position and same UMI. ok. But as you can see one read has a different size (read 3 has 75bp compared to 63 for the others). So for me read3 should not be part of this group. Is it possible to force UMI-tools to only group reads with the same size ?
CTAGCGGCCAGGAGAGACCGCAGGCAGACCGCTTCCCTCCAGGAAGAGCGCCAGTTTTCTAG-------------
CTAGCGGCCAGGAGAGACCGCAGGCAGACCGCTTCCCTCCAGGAAGAGCGCCAGTTTTCTAG-------------
CTAGCGGCCAGGAGAGACCGCAGGCAGACCGCTTCCCTCCAGGAAGAGCGCCAGTTTTCTAGCTGGCTCTTATAC
CTAGCGGCCAGGAGAGACCGCAGGCAGACCGCTTCCCTCCAGGAAGAGCGCCAGTTTTCTAG-------------
CTAGCGGCCAGGAGAGACCGCAGGCAGACCGCTTCCCTCCAGGAAGAGCGCCAGTTTTCTAG-------------
CTAGCGGCCAGGAGAGACCGCAGGCAGACCGCTTCCCTCCAGGAAGAGCGCCAGTTTTCTAG-------------
CTAGCGGCCAGGAGAGACCGCAGGCAGACCGCTTCCCTCCAGGAAGAGCGCCAGTTTTCTAG-------------
CTAGCGGCCAGGAGAGACCGCAGGCAGACCGCTTCCCTCCAGGAAGAGCGCCAGTTTAATAG-------------
********************************************************* ***
Thanks
Add that
mistake
as an answer and thenaccept
it. That way this thread will not remainopen
.