Entering edit mode
7.9 years ago
dec986
▴
370
Hello,
I have a strange technical problem where many different (and unique) alignment files from STAR somehow get identical output from bedtools multicov. I used the same procedure with Tophat and it worked fine:
I use the following command in STAR:
STAR --outSAMattributes All --runThreadN 8 --genomeDir /home/con/GENE_DATA/mm10 --readFilesIn $fastq1 $fastq2
the aligned bam files are different, sorted and indexed by samtools:
con@ubuntu:~/Embryo$ ll Epiblast/Male/HFD/?/4/sorted.bam
-rw-rwxr-- 1 con david 384127141 May 4 04:53 Epiblast/Male/HFD/1/4/sorted.bam\
-rw-rwxr-- 1 con david 333413416 May 4 00:43 Epiblast/Male/HFD/2/4/sorted.bam
-rw-rwxr-- 1 con david 575693742 May 4 02:28 Epiblast/Male/HFD/3/4/sorted.bam
-rw-rwxr-- 1 con david 464273054 May 4 05:57 Epiblast/Male/HFD/4/4/sorted.bam
and processing the aligned bam files from bedtools:
con@ubuntu:~/Embryo$ bedtools multicov -split -D -bams Epiblast/Male/HFD/1/4/sorted.bam -bed ~/GENE_DATA/mm10/mm10.bed > tmp1
con@ubuntu:~/Embryo$ bedtools multicov -split -D -bams Epiblast/Male/HFD/2/4/sorted.bam -bed ~/GENE_DATA/mm10/mm10.bed > tmp2
con@ubuntu:~/Embryo$ bedtools multicov -split -D -bams Epiblast/Male/HFD/3/4/sorted.bam -bed ~/GENE_DATA/mm10/mm10.bed > tmp3
con@ubuntu:~/Embryo$ bedtools multicov -split -D -bams Epiblast/Male/HFD/4/4/sorted.bam -bed ~/GENE_DATA/mm10/mm10.bed > tmp4
gives four identical files:
con@ubuntu:~/Embryo$ ll tmp?
-rw-r--r-- 1 con david 5270039 May 5 08:18 tmp1
-rw-r--r-- 1 con david 5270039 May 5 08:18 tmp2
-rw-r--r-- 1 con david 5270039 May 5 08:18 tmp3
-rw-r--r-- 1 con david 5270039 May 5 08:18 tmp4
con@ubuntu:~/Embryo$ diff tmp1 tmp2
con@ubuntu:~/Embryo$ diff tmp1 tmp3
con@ubuntu:~/Embryo$ diff tmp1 tmp4
Why is the output identical? How can I fix this?
I guess they are identical in size because they originate from the same bed file, but the numbers/counts are different?
is it because of the format of the bedfile?
what is the correct format of mm10.bed?
Could you check that files are different in content, but equal in size?
You could just check using less or head or tail or use
hi WouterDeCoster,
I did that, the files are exactly equal:
con@ubuntu:~/Embryo$ diff tmp1 tmp2 con@ubuntu:~/Embryo$ diff tmp1 tmp3 con@ubuntu:~/Embryo$ diff tmp1 tmp4
Every single gene is showing a count = 0, if this helps.
Why would bedtools multicov show a count of 0 for everything?
Common problem is that your bed file has 'chr' notation, while your bam file (and genome index) has integer notation, or vice versa. Could you check that?
Hi WouterDeCoster,
my genome reference file looks like this:
and the header of the bam file sorted.bam is
That's something we can fix easily ;-) Run the following on your gtf file:
This will change the "chr1" to "1" etc.
I tried this, I'm still getting the empty bedtools file :(
Hate to ask but did you use "newfile.gtf" for the last step? Are there any errors or just an empty file?
Wonder if this is happening because you are not providing all your BAM's with a single multicov command.
Multicov expects multiple BAM input.
Use genomecov/coverage if you need independent coverage values.
Hi genomax2,
on the bedtools manual page for multicov, http://bedtools.readthedocs.io/en/latest/content/tools/multicov.html
multicov works for single bam files.
Also, as you suggested, I tried the coverage option, but this produces output like
which I can't use because it doesn't have the transcript information contained in mm10.bed.
I just want counts from bam file, I don't know what's wrong :(
You want counts for what (genes or the features in your BED file)? You could use featureCounts (use a GTF file for MM10 if you want gene level counts or modify your BED file to a minimal GTF/SAF format) to get the counts.