I am just trying to create a bam file that contains only intergenic regions of genome. I have complete genome bam file but i couldn't find out any filtering way for this purpose. Do you have any suggestions about it ?
If you can put your intergenic regions into a bed file then samtools view -L your_bed your.bam > intergenic.bam will subset the file.
samtools view -L your_bed your.bam > intergenic.bam
And you can generate the bedfile of intergenic regions using bedtools complement.
Thank you very much for your answer but I am very new for bedtools so i would like to ask about genome file. How can I get this genome file ?
What genome are you working with?
It is zebrafish genome.
Bedtools will accept a GFF file so if you have one then you could try it with complement.
Edit: I should clarify that just using the GFF file as is may not be enough since it may have different kinds of annotations. If used as is then the resulting complemented version may not strictly have just intergenic regions. Some additional work will needed before running the bedtools complement command.
I think the "genome" file in question is the one supplied to -g in bedtools complement
Its a tab-delimited file, which contig names in the first column, and their lengths, in base-pairs in the second. One way to find this info is from the header of your bam file or the BAM index using samtools idxstats.
samtools idxstats my_bam.bam | cut -f1,2 | grep -v '*' > genome.tsv
I don't think this command is appropriate (unless I am missing something). It is not going to generate a file with the genic intervals which would be the input needed for bedtools complement.
bedtools complement needs two inputs
1) An interval file of some sort (which in this case should contain the genic regoins - a GTF or GFF will be fine for this). I was assuming the OP would have this, but if not it can be downloaded from here)
2) A "genome" file (the OP asked about the "genome file"). As describe above, it tells bedtools how long the contigs are. That was the file I was creating in my above comment.
Ah yes. Thanks for the clarification.
[bam_index_load] fail to load BAM index.
[bam_idxstats] fail to load the index.
that is error that i received everytime i try your code with my bam file :(
I also tried to create my own genome file from chrom.size link in UCSC however it gives every time same error with different chromosome number: chromosome ... does not exist in the genome file. I still cannot get result that will come from complement function of bedtools.
Is your bam file (sorted) indexed? If not you can index it by doing samtools index your_file.bam. Then the command will work.
samtools index your_file.bam
Yes i already indexed my bam file.
Is the .bai file in the same folder as the .bam file?
Sorry for late answer. Should I remove .bam file from folder?
No. Otherwise which file will you run the idxstats command on that @i.sudbery provided?
If you create a bed file with all intergenic regions of your genome of interest then it is just a simple bedtools intersect command
Login before adding your answer.
Use of this site constitutes acceptance of our User Agreement and Privacy