How can I generate a list of 100 genomic intervals from bam files?
2
0
Entering edit mode
9 months ago
kai_bio ▴ 50

I want to generate a list of 100 genomic intervals that are 100kb long and in which 50 genomic intervals should overlap some genes such as (GAPDH etc) from the bam file (mapping data). I want to check where some genes are being mapped and check the alignment summary of the mapping.

alignment mapping samtools bam genomics • 1.4k views
ADD COMMENT
0
Entering edit mode

Do you basically want to generate 100 intervals with half of them containing genes of your interests, and the other half not? I don't see your description of purpose having anything to do with BAMs, unless you haven't described something important (e.g. coverage). You are talking about something easier to be done with excel instead of using tools/packages.

ADD REPLY
1
Entering edit mode
9 months ago

I think bedtools makewindows will be helpful for you.

Either start from fasta, or from bed file. If bed, first use bamtobed to convert your bams to bed format.

Egs from bedtools output:

Examples:
 # Divide the human genome into windows of 1MB:
 $ bedtools makewindows -g hg19.txt -w 1000000
 chr1 0 1000000
 chr1 1000000 2000000
 chr1 2000000 3000000
 chr1 3000000 4000000
 chr1 4000000 5000000
 ...

 # Divide the human genome into sliding (=overlapping) windows of 1MB, with 500KB overlap:
 $ bedtools makewindows -g hg19.txt -w 1000000 -s 500000
 chr1 0 1000000
 chr1 500000 1500000
 chr1 1000000 2000000
 chr1 1500000 2500000
 chr1 2000000 3000000
 ...

 # Divide each chromosome in human genome to 1000 windows of equal size:
 $ bedtools makewindows -g hg19.txt -n 1000
 chr1 0 249251
 chr1 249251 498502
 chr1 498502 747753
 chr1 747753 997004
 chr1 997004 1246255
 ...

 # Divide each interval in the given BED file into 10 equal-sized windows:
 $ cat input.bed
 chr5 60000 70000
 chr5 73000 90000
 chr5 100000 101000
 $ bedtools makewindows -b input.bed -n 10
 chr5 60000 61000
 chr5 61000 62000
 chr5 62000 63000
 chr5 63000 64000
 chr5 64000 65000
ADD COMMENT
1
Entering edit mode

Thank you, I have generated the file.

ADD REPLY
0
Entering edit mode

Please accept this answer (green check mark) to provide closure to this thread.

ADD REPLY
0
Entering edit mode
9 months ago
LauferVA 4.2k

perhaps you are looking for a tool such as SNPsnap?

https://academic.oup.com/bioinformatics/article/31/3/418/2365926

ADD COMMENT
0
Entering edit mode

Thank you but this tool is for the GWAS data. I want to analyze the bam files.

ADD REPLY
1
Entering edit mode

yes but the tool will generate the kind of intervals you are seeking ... right? i may not understand the question, but once you have the loci, the file type should be irrelevant, correct?

it seems to me the (only) difficult part of this task is generating matched loci - this is what SNPsnap is good at. if you dont need them to be matched ... simply write a script to generate stretches of 100 kb - its easy to do

ADD REPLY

Login before adding your answer.

Traffic: 1660 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6