Question

bedtools error: could not find indexes

0

Entering edit mode

7.6 years ago

Katie D • 0

I want to use bedtools to count reads mapping back to genes, ultimately to use for differential expression analysis in edgeR. I used trinity to assemble my metatranscriptomes de novo, and bowtie2 to map raw reads back to the assembly. I seemingly sorted and indexed the file without a problem (returned no errors at least). When I go to run the bedtools multicov command, I get the error: could not find indexes. The indexes are in the same file as the bam file. Used prodigal for ORF calling to create .gff file.

Commands used:

First I sorted

samtools sort bowtie_output.bam -o sorted_output.bam -T temp_prefix

Then I indexed

samtools index -b sorted_output.bam indexed_sample.bam.bai

Then I try to get read counts

bedtools multicov -bams bowtie_output.bam -bed prodigal_output.gff

I thought maybe I was supposed to use the bam.bai file instead of bowtie output bam file, so I tried

bedtools multicov -bams indexed_file.bam.bai -bed prodigal_output.gff

But still get 'could not find indexes'

Any idea what I am doing wrong? Disclaimer: very much a beginner!

bedtools RNA-Seq • 5.9k views

ADD COMMENT • link updated 6.4 years ago by alouyakis • 0 • written 7.6 years ago by Katie D • 0

0

Entering edit mode

Don't supply output names to samtools index command, only input name.

ADD REPLY • link 6.7 years ago by cpad0112 21k

score 0 · Answer 1 · 2016-09-21

0

Entering edit mode

7.6 years ago

harold.smith.tarheel ★ 4.9k

1) The samtools command for indexing is (outputs .bai by default):

samtools index sorted_output.bam

2) Since you used Trinity to assemble your transcriptome, why did you use Prodigal to define ORFs? Do you need to distinguish non-coding RNAs (or 5' and 3' UTRs) for some reason?

3) Since you're mapping to the assembled transcriptome, wouldn't #_aligned_reads/total_reads provide the statistic you want?

4) If you want to use Bedtools for this purpose, I believe the correct command is 'coverage' with the '-counts' flag.

ADD COMMENT • link 7.6 years ago by harold.smith.tarheel ★ 4.9k

0

Entering edit mode

2) I used Prodigal in the meta mode, because I am dealing with metatranscriptomes (bacterial community). I don't need to distinguish ncRNAs or 5' and 3' UTRs, I don't think. But, I thought I should define ORFs if I am interested in particular transcripts. Is this not so? Are you asking because there is a better ORF caller for my purposes, or because it is unecessary?

I based the decision to use Prodigal off of this Davids et al 2016 paper: http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0146423

3) I should be more specific. I am ultimately interested in transcript counts of particular genes, because I want to see if there are genes or types of genes being expressed more in particular treatments compared to control treatments. So, I don't think what you described is the statistic I want.

ADD REPLY • link 7.6 years ago by Katie D • 0

score 0 · Answer 2 · 2017-12-06

0

Entering edit mode

6.4 years ago

alouyakis • 0

I'm late to the game, but it looks like you indexed "sorted_output.bam", but then used your original file, "bowtie_output.bam", instead of the indexed bam, "sorted_output.bam", in your multicov.

ADD COMMENT • link 6.4 years ago by alouyakis • 0