Identify genes in a tophat aligned bam/sam file
0
0
Entering edit mode
7.8 years ago
gchaves • 0

Hello all. I have sam/bam files with reads mapped to human genome from a RNA-seq experiment. I already tried bedtools bedtools multicov [OPTIONS] -bams aln.1.bam aln.2.bam ... aln.n.bam -bed <bed/gff/vcf>, but it gave me the error 'Could not find indexes.'. Also, if the bedtools work, what shall be the output and how can I get a list with counts of reads mapping to genes? Any help is more than appreciated. Thank you.

blast alignment rna-seq gene • 3.0k views
0
Entering edit mode

Do you have the bai files that contain the index for these bam files in the same directory? One issue could be that bedtools requires .bam.bai instead of .bai or vice-versa. For your second question, I would use RSeQC or HTSeq to generate gene counts. I am not against bedtools but I haven't ever used it for counts or RPKM generation.

0
Entering edit mode

Thank you very much. No. I dont have bai files and I dont know how I can generate them.

0
Entering edit mode

Download samtools and run samtools index aln.1.bam. I have updated my older comment.

0
Entering edit mode

Ok. I now have the accepted_hits.bam.bai file.

0
Entering edit mode

Good. Now run your original command again. You need not to mention the .bai files in your command. They will be automatically detected.

0
Entering edit mode

Thank you!!

I am run it as

bedtools  multicov -bams accepted_hits.bam -bed genes.bed


It gave me

Error: The requested file (genes.bed) could not be opened. Error message:
(No such file or directory). Exiting!


I appreciate your help very much!!

0
Entering edit mode

You may have a permission issue. Not sure though. Try chmod 777 genes.bed and rerun the original command.

0
Entering edit mode

I just followed someone else's script. I dont have this genes.bed file though. Do you know what it is? Thank you!!

0
Entering edit mode

You need to provide a genes.bed file for your organism of interest. Search biostars for posts that tell you how to download bed files from UCSC genome browser for your organism of interest. You can also provide gff/gtf file to Bedtools. So Ensembl Biomart is another place to where you can look for gtf file. Do some reading about different bedtools and different file formats (bed, gtf, vcf) required to run these tools. Also make sure that the chromosome names in your bed file is compatible with chromosome names in the bed or gtf file. You can run "samtools view -H aln.1.bam" to get the names of chromosomes in the reference genome that was used to align the reads.

0
Entering edit mode

User lacks the file, I reckon. One gets Permission denied error for the lack of permission rights plus 755 is better then 777 (it just scares me with fully open settings).

0
Entering edit mode

My bad. It was hard for me to believe that "genes.bed" is not a file but the string that was part of the command he was trying to run. So I just guessed about things and of course I was wrong :-).

0
Entering edit mode

I downloaded a bed file from the UCSC genome browser for the known genes of the human genome. It was sent to my galaxy account and I uploaded it to the server I am using. Unfortunately when I run this:

bedtools multicov [OPTIONS] -bams aln.1.bam aln.2.bam ... aln.n.bam -bed <bed gff="" vcf="">

the output does not seem to me to be what I would call a gene count.

0
Entering edit mode

Please type the exact command you used with the file names.

0
Entering edit mode

I typed

bedtools multicov -bams my_bam_file.bam -bed human_genome_bed_file.bed

0
Entering edit mode

Just a humble advice. Please be more elaborative when you reply. You said the output does not seem what you would call a gene count. Tell us what you got as the output. First question 1) Were you able to run the bedtools successfully without any error 2) If yes, did it produce any output file 3) if yes, show us first few lines of the output. You can got to the bedtools page and check what kind of output will be produced by this function.