Question

Problems With Mirdeep2

1

Entering edit mode

11.3 years ago

pmuench ▴ 140

I want to find miRNAs on a fastq file with mirDeep2. After adapter clipping and alignment (with mapper.pl from mirDeep2) against human_g1k_v37 (indexed with bowtie) I get following error message:

Error: Genome file /PathToFile/human_g1k_v37.fasta has not allowed whitespaces in its first identifier

The first line of the reference file looks like this:

>1 dna:chromosome chromosome:GRCh37:1:1:249250621:1

If I delete all white spaces in the identifiers, and rerun the mapping and mirDeep2, I get an other error message:

The mapped reference id 1 from file exmaple_collapsed.arf is not an id of the genome file /PathToFile/human_g1k_v37.fasta

Any idea how i have to convert the reference file? Thank you!

hg19 reference • 8.6k views

ADD COMMENT • link updated 7.6 years ago by gzl ▴ 20 • written 11.3 years ago by pmuench ▴ 140

Ram · Answer 1 · 2012-12-14

10

Entering edit mode

11.3 years ago

JC 13k

You need to rename the sequences in your reference fasta, just remove everything after the space:

 perl -plane 's/\s+.+$//' < genome.fa > new_genome.fa

You will need to create a new index.

ADD COMMENT • link 11.3 years ago by JC 13k

0

Entering edit mode

Hi,

I know this post has been out for a while but I'd appreciate some help. I tried a few data sets on mirdeep2 and I always got this error:

First line of FASTA reads file is not in accordance with the fasta format specifications
Please make sure your file is in accordance with the fasta format specifications and does not contain whitespace in IDs or sequences

***** Please check if the option you used (options c) designates the correct format of the supplied reads file earthwormShort1.fa *****

I have tried to remove whitespaces several times to ensure my fasta file is OK but the error keeps returning. Please, kindly help.

ADD REPLY • link updated 4.3 years ago by Ram 43k • written 8.3 years ago by Gabe Anderson ▴ 10