Question: umi_tools extract error: IndexError: string index out of range
gravatar for Rituriya
21 months ago by
Rituriya30 wrote:

Hi All,

Background: I have completed adapter trimming and checked QC on Illumina NextSeq miRNA single end reads of length 75bp. I want to run umi_tools to extract the UMI information before I align the reads to the reference. I am unable to run umi_tools extract.

Command used:

umi_tools extract --stdin=XYZ_R1-trim.fastq.gz --bc-pattern=NNNNNNNNNNNN -L XYZ-extract.log --stdout=XYZ-UMIextracted.fastq.gz

I have a 12 bp UMI barcode here. I think the pattern could be the culprit here. I am new to this UMI analysis, could anyone please share their insight as to what is the mistake here? Error message seen on screen:

    Traceback (most recent call last):
  File "/home/xyz/.local/bin/umi_tools", line 11, in <module>
  File "/home/xyz/.local/lib/python2.7/site-packages/umi_tools/", line 57, in main
  File "/home/xyz/.local/lib/python2.7/site-packages/umi_tools/", line 330, in main
    new_read = ReadExtractor(read)
  File "/home/xyz/.local/lib/python2.7/site-packages/umi_tools/", line 971, in __call__
    umi_values = self.getBarcodes(read1, read2)
  File "/home/xyz/.local/lib/python2.7/site-packages/umi_tools/", line 726, in _getBarcodesString
    umi_quals = [bc_qual1[x] for x in self.umi_bases]
IndexError: string index out of range

Also, am I supposed to use whitelist command before extract? This is not single cell RNA data and hence I omitted that step.

Thank you.

qiaseq umi_tools mirnaseq • 939 views
ADD COMMENTlink modified 10 months ago by markaldo0 • written 21 months ago by Rituriya30
gravatar for michael.ante
21 months ago by
michael.ante3.6k wrote:

Hi Rituriya,

You named your input "trim"; are all reads long enough to extract the 12 bp sequence? Try to run the UMI extract before trimming.



ADD COMMENTlink written 21 months ago by michael.ante3.6k

Thank you so much Michael! You were right, it just went through if I did adapter trimming after UMI extraction.

Sometimes, a break from the usual routine is what that works for a dataset!

Thanks again, Rituriya.

ADD REPLYlink written 21 months ago by Rituriya30
gravatar for markaldo
10 months ago by
markaldo0 wrote:

In Python, a string is a single-dimensional array of characters. The string index out of range means that the index you are trying to access does not exist. In a string, that means you're trying to get a character from the string at a given point. If that given point does not exist , then you will be trying to get a character that is not inside of the string. Indexes in Python programming start at 0. This means that the maximum index for any string will always be length-1 . There are several ways to account for this. Knowing the length of your string (using len() function)could certainly help you to avoid going over the index.

ADD COMMENTlink written 10 months ago by markaldo0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1349 users visited in the last hour