I want to ask another question related to my previous post here: C: compute depth of coverage for certain regions of the genome using samtools depth
I have downloaded bam files of 1000 genomes project phase 3, "not the entire genome but only for certain regions" by using samtools view
. I want to compute depth of coverage for these specific regions which can be done by using samtools depth -r
.
According to my previous post, samtools depth -r
will not work and first I need to reindex my bam files. I re-index them by using
wget -q -O - "ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/current.tree" | cut -f 1 | grep '.bam$' | grep -Ev -i ".chrom20.|.chrom11." | parallel --dry-run /data/programs/samtools-1.3.1/samtools view -b -o {/} {} "16:32297737-33456560"
however, when I try samtools depth -r after re-indexing,
/data/programs/samtools-1.3.1/samtools depth -r 16:32297737-33456560 NA21137.mapped.ILLUMINA.bwa.GIH.low_coverage.20120522.bam > bb
my output file is empty, I also get an error message that
Warning: The index file is older than the data file: NA21137.mapped.ILLUMINA.bwa.GIH.low_coverage.20120522.bam.bai
I wonder what is wrong here? dos anyone have any idea? thanks
thanks @cpad0112, unfortunately I really do not understand your response and what do you mean by indexing bam files and trying the code again! I downloaded the bam files by this command
the outputs are bam and bam.bai files (bam.bai are index files as far as I know). later on I re-index the files by this
Can you explain it bit more for me, I am totally new in human genome project, a bit confused by what you mean exactly.
wget -q -O - "ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/current.tree" | cut -f 1 | grep '.bam$' | grep -Ev -i ".chrom20.|.chrom11." | parallel --dry-run /data/programs/samtools-1.3.1/samtools view -b -o {/} {} "16:32297737-33456560"
is not for indexing the bam files. This a different way to download the bam files from internet.samtools index input.bam
is the right command to index bam files.Following is a test run for indexing bam and calculating depth by samtools on a single file, in a temporary directory. If these steps are successful, then you can apply them to all the bams in the directory.
Index the bam file with following command:
samtools index NA21137.mapped.ILLUMINA.bwa.GIH.low_coverage.20120522.bam
Now you should see two files with .bam and .bam.bai extensions
Run samtools depth (example given below):
samtools depth -r 16:32297737-32297837 NA21137.mapped.ILLUMINA.bwa.GIH.low_coverage.20120522.bam
samtools depth -r 16:32297737-32297837 NA21137.mapped.ILLUMINA.bwa.GIH.low_coverage.20120522.bam > NA21137.mapped.ILLUMINA.bwa.GIH.low_coverage.20120522. text.
Note: I edited post mentioning parallel that parallel is used for downloading bams, as alternative to bash script provided in that post.
If above steps are successful, navigate to wherever bam files are. Try running following command:
This would index all the bam files. Try running samtools depth again on bam files.
thanks so much, excellent help