Question: error when reads.fa input file contains info for more than one sample
22 months ago
fana0 wrote:


I am having trouble using miRDeep2 package. It looks like I am running using a config.txt file which contains multiple samples correctly. However, when I try to run I get the following error. If I run it runs smoothly though. Any ideas?

Error: problem with processed_reads.fa
Use of uninitialized value in split at /usr/biosoft/packages/miRDeep2/mirdeep2_0_0_8/bin/ line 179, <IN> line 11334728.
Use of uninitialized value in length at /usr/biosoft/packages/miRDeep2/mirdeep2_0_0_8/bin/ line 185, <IN> line 11334728.
Error in line 5.667.364: The sequence
occures at least twice in your reads file.

At first it occured at line 
Please make sure that your reads file only contains unique sequences.
10 weeks ago
h.mon23k wrote:

I didn't see the thread before, so posting an late answer: I had exactly the same error when using an incorrectly formatted "config.txt" file with I suspect miRDeep expects the three-letter codes to be unique, not related to treatment. When I corrected the three-letter codes to unique ones (TR1, TR2, CT1, CT2 as opposed to TRT, TRT, CTL, CTL), later worked fine.

22 months ago
galina_ananina20 wrote:

If I remember it right, we concatenated all samples to one and ran Then, we applied using reads.fa and others required files and it did work.

22 months ago
Chris Fields2.0k
University of Illinois Urbana-Champaign
Chris Fields2.0k wrote:

I've run this with the config.txt file before w/o problems, but I collapsed reads (-m option with The sanity check step that failed seems to indicate you have a duplicate read present, which means the reads haven't been collapsed.

One recent example run that worked fine (your mileage may vary): $CONFIG -o $PBS_NUM_PPN \
    -d -e -q -j -l 17 \
    -m -h -u -n \
    -p $INDEX \
    -s reads_collapsed.fa \
    -t reads_collapsed_vs_genome.arf \
    -v 2> mapping.out reads_collapsed.fa \
    $GENOME \
    reads_collapsed_vs_genome.arf \
    mature-species.fa \
    mature-other.fa \
    precursor.fa \
    -P 2> report.log

EDIT: of course, edit the relevant variables with your config, genome, genome index, etc.

Thank you for the example. Could you please have a quick look at my commands? I did collapse reads (see the top lines of the 'reads.fa' file below too). I expect that when you have multiple samples some sequences might be the same across them. config.txt -d -e -p galGal4 -s processed_reads.fa -t mapped_reads.arf -h -m -i -j

Inspecting reads:

head processed_reads.fa
TTTGGCAATGGTAGAACTCACA processed_reads.fa galGal4.fa mapped_reads.arf gga4_mirbase21_mature.fa none gga4_mirbase21_hairpin.fa
That's essentially correct, yes; reads are collapsed per sample. What this seems to indicate is that you have two reads with the same sequence from the same sample. Should be easy enough to see if you grep for the sequence and check the line before:

-system-specific-4.1$ grep -B1 '^AACCCGTAGATCCGAACTTGT$' reads_collapsed.fa
