Question: error when reads.fa input file contains info for more than one sample
gravatar for fana
22 months ago by
fana0 wrote:


I am having trouble using miRDeep2 package. It looks like I am running using a config.txt file which contains multiple samples correctly. However, when I try to run I get the following error. If I run it runs smoothly though. Any ideas?

Error: problem with processed_reads.fa
Use of uninitialized value in split at /usr/biosoft/packages/miRDeep2/mirdeep2_0_0_8/bin/ line 179, <IN> line 11334728.
Use of uninitialized value in length at /usr/biosoft/packages/miRDeep2/mirdeep2_0_0_8/bin/ line 185, <IN> line 11334728.
Error in line 5.667.364: The sequence
occures at least twice in your reads file.

At first it occured at line 
Please make sure that your reads file only contains unique sequences.
mirdeep2 mirna-seq next-gen • 1.0k views
ADD COMMENTlink modified 10 weeks ago by h.mon23k • written 22 months ago by fana0
gravatar for h.mon
10 weeks ago by
h.mon23k wrote:

I didn't see the thread before, so posting an late answer: I had exactly the same error when using an incorrectly formatted "config.txt" file with I suspect miRDeep expects the three-letter codes to be unique, not related to treatment. When I corrected the three-letter codes to unique ones (TR1, TR2, CT1, CT2 as opposed to TRT, TRT, CTL, CTL), later worked fine.

ADD COMMENTlink written 10 weeks ago by h.mon23k
gravatar for galina_ananina
22 months ago by
galina_ananina20 wrote:

If I remember it right, we concatenated all samples to one and ran Then, we applied using reads.fa and others required files and it did work.

ADD COMMENTlink written 22 months ago by galina_ananina20
gravatar for Chris Fields
22 months ago by
Chris Fields2.0k
University of Illinois Urbana-Champaign
Chris Fields2.0k wrote:

I've run this with the config.txt file before w/o problems, but I collapsed reads (-m option with The sanity check step that failed seems to indicate you have a duplicate read present, which means the reads haven't been collapsed.

One recent example run that worked fine (your mileage may vary): $CONFIG -o $PBS_NUM_PPN \
    -d -e -q -j -l 17 \
    -m -h -u -n \
    -p $INDEX \
    -s reads_collapsed.fa \
    -t reads_collapsed_vs_genome.arf \
    -v 2> mapping.out reads_collapsed.fa \
    $GENOME \
    reads_collapsed_vs_genome.arf \
    mature-species.fa \
    mature-other.fa \
    precursor.fa \
    -P 2> report.log

EDIT: of course, edit the relevant variables with your config, genome, genome index, etc.

ADD COMMENTlink modified 22 months ago • written 22 months ago by Chris Fields2.0k

Thank you for the example. Could you please have a quick look at my commands? I did collapse reads (see the top lines of the 'reads.fa' file below too). I expect that when you have multiple samples some sequences might be the same across them. config.txt -d -e -p galGal4 -s processed_reads.fa -t mapped_reads.arf -h -m -i -j

Inspecting reads:

head processed_reads.fa
TTTGGCAATGGTAGAACTCACA processed_reads.fa galGal4.fa mapped_reads.arf gga4_mirbase21_mature.fa none gga4_mirbase21_hairpin.fa
ADD REPLYlink modified 10 weeks ago by h.mon23k • written 22 months ago by fana0

That's essentially correct, yes; reads are collapsed per sample. What this seems to indicate is that you have two reads with the same sequence from the same sample. Should be easy enough to see if you grep for the sequence and check the line before:

-system-specific-4.1$ grep -B1 '^AACCCGTAGATCCGAACTTGT$' reads_collapsed.fa
ADD REPLYlink written 22 months ago by Chris Fields2.0k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 652 users visited in the last hour