Question

miRDeep2 mapper.pl not generating *collapsed.fa and/or *.arf files correctly?

0

Entering edit mode

11 months ago

jammydodger123456 ▴ 40

We are looking for pathogen miRNAs in host tissues among reads that couldn’t map to the host genome. The relevant commands are below;

mapper.pl "$file"_HOST_UNMAPPED.fq -e -h -m -p $PathogenGenomeBasename -t "$file"_HOST_UNMAPPED_vs_PATHOGEN.arf -v -s "$file"_HOST_UNMAPPED_vs_PATHOGEN_COLLAPSED.fa

miRDeep2.pl "$file"_HOST_UNMAPPED_vs_PATHOGEN_COLLAPSED.fa $PathogenGenomeBasename.fa HOST_UNMAPPED_vs_PATHOGEN.arf $PathogenMatureMiRNA $PathogenMiRNA $PathogenPrecursorMiRNA -c -d -g -1

According to this analysis resulting from the above, we are getting miRNAs in our uninfected controls. When I look into the log file, I can see the following for the mapper.pl stage;

parsing fastq to fasta format
discarding short reads
collapsing reads
mapping reads to genome index
trimming unmapped nts in the 3' ends
Log file for this run is in mapper_logs and called mapper.log_9529
Mapping statistics

#desc   total   mapped  unmapped    %mapped %unmapped
total: 2290436  27708   2262728 1.210   98.790
seq: 2290436    27708   2262728 1.210   98.790

And this for the miRDeep2.pl stage;

Converting input files
building bowtie index
mapping mature sequences against index
mapping read sequences against index
Mapping statistics

#desc   total   mapped  unmapped    %mapped %unmapped
total: 2290420  24913   2265507 1.088   98.912
seq: 2290420    24913   2265507 1.088   98.912

Surely the total reads for miRDeep2.pl to start with should be 27,708, and not 2,290,420? If so, could anyone advise how to adjust the above commands to fix this? My understanding is that mirdeep2.pl takes the reads that mapper.pl maps, and tries to identify miRNAs from those mappings if they satisfy the various criteria. Is this correct?

My assumption is that with millions of reads going into the miRDeep2.pl stage (instead of the ~20,000), this is generating lots of false positives.

small RNA miRDeep2 sequencing • 681 views

ADD COMMENT • link updated 11 months ago by Trivas ★ 1.7k • written 11 months ago by jammydodger123456 ▴ 40

1

Entering edit mode

Your input into miRDeep2.pl is just the processed reads, e.g. those filtered for one reason or another. This does not include any information about whether they were properly mapped or not.

ADD REPLY • link 11 months ago by Trivas ★ 1.7k

0

Entering edit mode

Ah ok, I thought that the collapsed reads fasta only included reads that had mapped. What options can I use to generate a fasta of mapped reads following mapper.pl? Is it possible to extract mapped reads in a fasta format from the arf file?

ADD REPLY • link 11 months ago by jammydodger123456 ▴ 40

0

Entering edit mode

I'm not sure it matters, since it'll only likely change the %mapped. The information you get out of the miRDeep2 pipeline will be identical other than these QC metrics.

ADD REPLY • link 11 months ago by Trivas ★ 1.7k