Hi everyone,
I have the fastq files for some miRNA libraries prepared with the QIAseq miRNA Library Kit. I have to do the UMI extraction, but the problem is that the UMI is after a common sequence for all the reads, such as this:
NNNNNNNNNNNNNNNNNNNAACTGTAGGCACCATCAAT*XXXXXXXXXXXX*NNNNNNNNN
Where the N are the miRNA sequences, the bold part is the common sequence for all the reads and the part with all the X is the part with the UMI sequence.
How could I remove the bold part and append the UMI to the header of the fastq file? The problem is that I have seen that around 3-5% of the reads don't have the common sequence, I suppose that there are sequencing errors and some part of this sequence is changed in some reads, but I don't know how to accept one letter change in the common part.
Thank you very much!
For future visitors: While this question has been solved, QIAGEN makes a set of web based tools available (appear to be free as of this writing) called
GeneGlobe(LINK).If you are not able to make use of
umi-toolson command line then you can tryGeneGlobefor analysis of QIAseq miRNA data. Handbook for QIAseq library kit has information on how to use.You've got two sets of Ns here - one at the start and one at the end. Are they both miRNA sequences? If not, is it the 3' or the 5' Ns that are the miRNA sequence?