Question: bedtools error: could not find indexes
gravatar for Katie D
14 months ago by
Katie D0
Katie D0 wrote:

I want to use bedtools to count reads mapping back to genes, ultimately to use for differential expression analysis in edgeR. I used trinity to assemble my metatranscriptomes de novo, and bowtie2 to map raw reads back to the assembly. I seemingly sorted and indexed the file without a problem (returned no errors at least). When I go to run the bedtools multicov command, I get the error: could not find indexes. The indexes are in the same file as the bam file. Used prodigal for ORF calling to create .gff file.

Commands used:

First I sorted

samtools sort bowtie_output.bam -o sorted_output.bam -T temp_prefix

Then I indexed

samtools index -b sorted_output.bam indexed_sample.bam.bai

Then I try to get read counts

bedtools multicov -bams bowtie_output.bam -bed prodigal_output.gff

I thought maybe I was supposed to use the bam.bai file instead of bowtie output bam file, so I tried

bedtools multicov -bams indexed_file.bam.bai -bed prodigal_output.gff

But still get 'could not find indexes'

Any idea what I am doing wrong? Disclaimer: very much a beginner!

rna-seq bedtools • 940 views
ADD COMMENTlink modified 5 days ago by alouyakis0 • written 14 months ago by Katie D0

Don't supply output names to samtools index command, only input name.

ADD REPLYlink written 4 months ago by cpad01123.6k
gravatar for harold.smith.tarheel
14 months ago by
United States
harold.smith.tarheel3.9k wrote:

1) The samtools command for indexing is (outputs .bai by default):

samtools index sorted_output.bam

2) Since you used Trinity to assemble your transcriptome, why did you use Prodigal to define ORFs? Do you need to distinguish non-coding RNAs (or 5' and 3' UTRs) for some reason?

3) Since you're mapping to the assembled transcriptome, wouldn't #_aligned_reads/total_reads provide the statistic you want?

4) If you want to use Bedtools for this purpose, I believe the correct command is 'coverage' with the '-counts' flag.

ADD COMMENTlink modified 14 months ago • written 14 months ago by harold.smith.tarheel3.9k

2) I used Prodigal in the meta mode, because I am dealing with metatranscriptomes (bacterial community). I don't need to distinguish ncRNAs or 5' and 3' UTRs, I don't think. But, I thought I should define ORFs if I am interested in particular transcripts. Is this not so? Are you asking because there is a better ORF caller for my purposes, or because it is unecessary?

I based the decision to use Prodigal off of this Davids et al 2016 paper:

3) I should be more specific. I am ultimately interested in transcript counts of particular genes, because I want to see if there are genes or types of genes being expressed more in particular treatments compared to control treatments. So, I don't think what you described is the statistic I want.

ADD REPLYlink written 14 months ago by Katie D0
gravatar for alouyakis
5 days ago by
alouyakis0 wrote:

I'm late to the game, but it looks like you indexed "sorted_output.bam", but then used your original file, "bowtie_output.bam", instead of the indexed bam, "sorted_output.bam", in your multicov.

ADD COMMENTlink written 5 days ago by alouyakis0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 974 users visited in the last hour