UMI-tools: reads with different sizes in the same UMI group
1
0
Entering edit mode
4.1 years ago

EDIT : Just saw that --read-length parameter answer my question ! My bad...

Hi,

I run UMI-tools (https://github.com/CGATOxford/UMI-tools) on my dataset using this command :

python $umitools/group.py -I sorted.bam --paired --edit-distance-threshold=1 --group-out=groups.tsv -L stats.txt

After that I've a group.tsv file containing the groups for my reads. Here's a subset of my group.tsv :

read_id contig  position    umi umi_count   final_umi   final_umi_count unique_id
NS500186:291:HTYKYAFXX:3:11511:4602:8498_CTCTTCCGATCT   chr1    504 CTCTTCCGATCT    1   CTCTTCCGATCT    1   0
NS500186:291:HTYKYAFXX:3:21407:18031:16780_CTCTTCCGATCT chr1    504 CTCTTCCGATCT    1   CTCTTCCGATCT    1   1
NS500186:291:HTYKYAFXX:1:11103:22901:14851_CTCTTCCGATCT chr1    504 CTCTTCCGATCT    24  CTCTTCCGATCT    24  2
NS500186:291:HTYKYAFXX:1:11104:5947:6879_CTCTTCCGATCT   chr1    504 CTCTTCCGATCT    24  CTCTTCCGATCT    24  2
NS500186:291:HTYKYAFXX:1:11105:22350:7050_CTCTTCCGATCT  chr1    504 CTCTTCCGATCT    24  CTCTTCCGATCT    24  2
NS500186:291:HTYKYAFXX:1:11207:26170:12155_CTCTTCCGATCT chr1    504 CTCTTCCGATCT    24  CTCTTCCGATCT    24  2
NS500186:291:HTYKYAFXX:1:11209:5982:16287_CTCTTCCGATCT  chr1    504 CTCTTCCGATCT    24  CTCTTCCGATCT    24  2
NS500186:291:HTYKYAFXX:1:11304:14850:18144_CTCTTCCGATCT chr1    504 CTCTTCCGATCT    24  CTCTTCCGATCT    24  2
NS500186:291:HTYKYAFXX:1:11307:16408:8652_CTCTTCCGATCT  chr1    504 CTCTTCCGATCT    24  CTCTTCCGATCT    24  2

So each read harboring the same alignment position and the same umi (with max 1bp distance) are assigned to the same group.

But looking at the reads I've a lot of cases where the read sequence is not the same (in size). Alignment starts is ok but alignment end is not the same because the input read has a different length.

Example, let's take the umi group 13 containing 8 reads :

read_seq    position    umi umi_count   final_umi   final_umi_count unique_id
CTAGCGGCCAGGAGAGACCGCAGGCAGACCGCTTCCCTCCAGGAAGAGCGCCAGTTTTCTAG  8693    CTCTTCCTGGAG    8   CTCTTCCTGGAG    8   13
CTAGCGGCCAGGAGAGACCGCAGGCAGACCGCTTCCCTCCAGGAAGAGCGCCAGTTTTCTAG  8693    CTCTTCCTGGAG    8   CTCTTCCTGGAG    8   13
CTAGCGGCCAGGAGAGACCGCAGGCAGACCGCTTCCCTCCAGGAAGAGCGCCAGTTTTCTAGCTGGCTCTTATAC 8693    CTCTTCCTGGAG    8   CTCTTCCTGGAG    8   13
CTAGCGGCCAGGAGAGACCGCAGGCAGACCGCTTCCCTCCAGGAAGAGCGCCAGTTTTCTAG  8693    CTCTTCCTGGAG    8   CTCTTCCTGGAG    8   13
CTAGCGGCCAGGAGAGACCGCAGGCAGACCGCTTCCCTCCAGGAAGAGCGCCAGTTTTCTAG  8693    CTCTTCCTGGAG    8   CTCTTCCTGGAG    8   13
CTAGCGGCCAGGAGAGACCGCAGGCAGACCGCTTCCCTCCAGGAAGAGCGCCAGTTTAATAG  8693    CTCTTCCTGGAG    8   CTCTTCCTGGAG    8   13
CTAGCGGCCAGGAGAGACCGCAGGCAGACCGCTTCCCTCCAGGAAGAGCGCCAGTTTTCTAG  8693    CTCTTCCTGGAG    8   CTCTTCCTGGAG    8   13
CTAGCGGCCAGGAGAGACCGCAGGCAGACCGCTTCCCTCCAGGAAGAGCGCCAGTTTTCTAG  8693    CTCTTCCTGGAG    8   CTCTTCCTGGAG    8   13

So same alignment position and same UMI. ok. But as you can see one read has a different size (read 3 has 75bp compared to 63 for the others). So for me read3 should not be part of this group. Is it possible to force UMI-tools to only group reads with the same size ?

CTAGCGGCCAGGAGAGACCGCAGGCAGACCGCTTCCCTCCAGGAAGAGCGCCAGTTTTCTAG-------------
CTAGCGGCCAGGAGAGACCGCAGGCAGACCGCTTCCCTCCAGGAAGAGCGCCAGTTTTCTAG-------------
CTAGCGGCCAGGAGAGACCGCAGGCAGACCGCTTCCCTCCAGGAAGAGCGCCAGTTTTCTAGCTGGCTCTTATAC
CTAGCGGCCAGGAGAGACCGCAGGCAGACCGCTTCCCTCCAGGAAGAGCGCCAGTTTTCTAG-------------
CTAGCGGCCAGGAGAGACCGCAGGCAGACCGCTTCCCTCCAGGAAGAGCGCCAGTTTTCTAG-------------
CTAGCGGCCAGGAGAGACCGCAGGCAGACCGCTTCCCTCCAGGAAGAGCGCCAGTTTTCTAG-------------
CTAGCGGCCAGGAGAGACCGCAGGCAGACCGCTTCCCTCCAGGAAGAGCGCCAGTTTTCTAG-------------
CTAGCGGCCAGGAGAGACCGCAGGCAGACCGCTTCCCTCCAGGAAGAGCGCCAGTTTAATAG-------------
*********************************************************  ***

Thanks

UMI-tools group reads • 2.5k views
ADD COMMENT
0
Entering edit mode

Add that mistake as an answer and then accept it. That way this thread will not remain open.

ADD REPLY
1
Entering edit mode
4.1 years ago

Just saw that --read-length parameter answer my question ! My bad...

--read-length         use read length in addition to position and UMIto
                      identify possible duplicates [default=False]
ADD COMMENT

Login before adding your answer.

Traffic: 1785 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6