I have RNA-seq data and would like to get the reads from specific region for example -/+ 50 reads before and after stop site. do you know how I can get it from bam file
You may find some info in this post: C: alignment containing sequences from position a to b
Easiest way is SAMtools view, you can specify a single region:
samtools view -h <your.bam> chr1:1000-2000
or you can give a BED file (I think in tab-delimited format):
samtools view -h -L <your.bed> <your.bam>
Make sure to use same chromosomal naming (either with 'chr' or without) as in the BAM. You can check with
samtools view -H <your.bam>
Hey Bruce, not sure why this wasn't upvoted (I upvoted just now), as it's the best and most concise answer I've seen on this particular question.
Thanks, I find it particularly useful for IGV or other viewers, instead of trying to load a whole BAM locally.
will this include only the reads the fully lie in the position? i.e. 900-1200 no, and 1100-1400 yes?
No, any that overlap the interval. You can, nevertheless, still use the above command and then pipe the output to, e.g., awk, in order to further filter for reads only in each desired interval.
could please give me an example how to do this? Or a link to something similar?
There is likely some 'random' script out there that can do this. Perhaps even BEDTools has, these days, some way to do this.
What you need to do is start with the POS field in the BAM/SAM file, which represents the left-most co-ordinate to which your read aligns / maps. Then, you have to parse the CIGAR string to understand how many more bases are involved in the alignment.
...or, if all of your reads are 50bp and your interval of interest is 100bp, then you just need to keep any read whose POS value starts between base 1 and 50.
Login before adding your answer.
Use of this site constitutes acceptance of our User Agreement and Privacy