Question: bedtools error: could not find indexes
gravatar for Katie D
4.3 years ago by
Katie D0
Katie D0 wrote:

I want to use bedtools to count reads mapping back to genes, ultimately to use for differential expression analysis in edgeR. I used trinity to assemble my metatranscriptomes de novo, and bowtie2 to map raw reads back to the assembly. I seemingly sorted and indexed the file without a problem (returned no errors at least). When I go to run the bedtools multicov command, I get the error: could not find indexes. The indexes are in the same file as the bam file. Used prodigal for ORF calling to create .gff file.

Commands used:

First I sorted

samtools sort bowtie_output.bam -o sorted_output.bam -T temp_prefix

Then I indexed

samtools index -b sorted_output.bam indexed_sample.bam.bai

Then I try to get read counts

bedtools multicov -bams bowtie_output.bam -bed prodigal_output.gff

I thought maybe I was supposed to use the bam.bai file instead of bowtie output bam file, so I tried

bedtools multicov -bams indexed_file.bam.bai -bed prodigal_output.gff

But still get 'could not find indexes'

Any idea what I am doing wrong? Disclaimer: very much a beginner!

rna-seq bedtools • 3.5k views
ADD COMMENTlink modified 3.1 years ago by alouyakis0 • written 4.3 years ago by Katie D0

Don't supply output names to samtools index command, only input name.

ADD REPLYlink written 3.4 years ago by cpad011214k
gravatar for harold.smith.tarheel
4.3 years ago by
United States
harold.smith.tarheel4.6k wrote:

1) The samtools command for indexing is (outputs .bai by default):

samtools index sorted_output.bam

2) Since you used Trinity to assemble your transcriptome, why did you use Prodigal to define ORFs? Do you need to distinguish non-coding RNAs (or 5' and 3' UTRs) for some reason?

3) Since you're mapping to the assembled transcriptome, wouldn't #_aligned_reads/total_reads provide the statistic you want?

4) If you want to use Bedtools for this purpose, I believe the correct command is 'coverage' with the '-counts' flag.

ADD COMMENTlink modified 4.3 years ago • written 4.3 years ago by harold.smith.tarheel4.6k

2) I used Prodigal in the meta mode, because I am dealing with metatranscriptomes (bacterial community). I don't need to distinguish ncRNAs or 5' and 3' UTRs, I don't think. But, I thought I should define ORFs if I am interested in particular transcripts. Is this not so? Are you asking because there is a better ORF caller for my purposes, or because it is unecessary?

I based the decision to use Prodigal off of this Davids et al 2016 paper:

3) I should be more specific. I am ultimately interested in transcript counts of particular genes, because I want to see if there are genes or types of genes being expressed more in particular treatments compared to control treatments. So, I don't think what you described is the statistic I want.

ADD REPLYlink written 4.3 years ago by Katie D0
gravatar for alouyakis
3.1 years ago by
alouyakis0 wrote:

I'm late to the game, but it looks like you indexed "sorted_output.bam", but then used your original file, "bowtie_output.bam", instead of the indexed bam, "sorted_output.bam", in your multicov.

ADD COMMENTlink written 3.1 years ago by alouyakis0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2746 users visited in the last hour