Question: error when reads.fa input file contains info for more than one sample
gravatar for fana
3.6 years ago by
fana30 wrote:


I am having trouble using miRDeep2 package. It looks like I am running using a config.txt file which contains multiple samples correctly. However, when I try to run I get the following error. If I run it runs smoothly though. Any ideas?

Error: problem with processed_reads.fa
Use of uninitialized value in split at /usr/biosoft/packages/miRDeep2/mirdeep2_0_0_8/bin/ line 179, <IN> line 11334728.
Use of uninitialized value in length at /usr/biosoft/packages/miRDeep2/mirdeep2_0_0_8/bin/ line 185, <IN> line 11334728.
Error in line 5.667.364: The sequence
occures at least twice in your reads file.

At first it occured at line 
Please make sure that your reads file only contains unique sequences.
mirdeep2 mirna-seq next-gen • 1.8k views
ADD COMMENTlink modified 23 months ago by h.mon31k • written 3.6 years ago by fana30
gravatar for h.mon
23 months ago by
h.mon31k wrote:

I didn't see the thread before, so posting an late answer: I had exactly the same error when using an incorrectly formatted "config.txt" file with I suspect miRDeep expects the three-letter codes to be unique, not related to treatment. When I corrected the three-letter codes to unique ones (TR1, TR2, CT1, CT2 as opposed to TRT, TRT, CTL, CTL), later worked fine.

ADD COMMENTlink written 23 months ago by h.mon31k
gravatar for galina_ananina
3.6 years ago by
galina_ananina20 wrote:

If I remember it right, we concatenated all samples to one and ran Then, we applied using reads.fa and others required files and it did work.

ADD COMMENTlink written 3.6 years ago by galina_ananina20
gravatar for Chris Fields
3.6 years ago by
Chris Fields2.1k
University of Illinois Urbana-Champaign
Chris Fields2.1k wrote:

I've run this with the config.txt file before w/o problems, but I collapsed reads (-m option with The sanity check step that failed seems to indicate you have a duplicate read present, which means the reads haven't been collapsed.

One recent example run that worked fine (your mileage may vary): $CONFIG -o $PBS_NUM_PPN \
    -d -e -q -j -l 17 \
    -m -h -u -n \
    -p $INDEX \
    -s reads_collapsed.fa \
    -t reads_collapsed_vs_genome.arf \
    -v 2> mapping.out reads_collapsed.fa \
    $GENOME \
    reads_collapsed_vs_genome.arf \
    mature-species.fa \
    mature-other.fa \
    precursor.fa \
    -P 2> report.log

EDIT: of course, edit the relevant variables with your config, genome, genome index, etc.

ADD COMMENTlink modified 3.6 years ago • written 3.6 years ago by Chris Fields2.1k

Thank you for the example. Could you please have a quick look at my commands? I did collapse reads (see the top lines of the 'reads.fa' file below too). I expect that when you have multiple samples some sequences might be the same across them. config.txt -d -e -p galGal4 -s processed_reads.fa -t mapped_reads.arf -h -m -i -j

Inspecting reads:

head processed_reads.fa
TTTGGCAATGGTAGAACTCACA processed_reads.fa galGal4.fa mapped_reads.arf gga4_mirbase21_mature.fa none gga4_mirbase21_hairpin.fa
ADD REPLYlink modified 23 months ago by h.mon31k • written 3.6 years ago by fana30

That's essentially correct, yes; reads are collapsed per sample. What this seems to indicate is that you have two reads with the same sequence from the same sample. Should be easy enough to see if you grep for the sequence and check the line before:

-system-specific-4.1$ grep -B1 '^AACCCGTAGATCCGAACTTGT$' reads_collapsed.fa
ADD REPLYlink written 3.6 years ago by Chris Fields2.1k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1025 users visited in the last hour