I am new to the genomics field and I have recently started working with SAMTOOLS. My question is the following:
- I have my SAM file with 2,5 M aligned reads.
- I convert it to BAM file with SAMTOOLS
- I make the index of the BAM and the reference file
What I want to do is use the tview command to study a region of interest (say position 150-160). So I write this:
samtools tview -d T -p ref_cons:145 BAM_FILE --reference REFERENCE_FILE
It works fine, BUT the problem is that it returns only ~8,000 lines (reads). My question then is if there is some limit of the number of alignments that are outputted and, if so, how can I make it so that it prints all of them? I attach a screen-shot of the output I am getting.
ok, let's see this in a different way: WHY would you need to visualize more than 8K rows ?
Maybe it is because of hard-coded depth limit in samtools. Please see here: https://biostar.usegalaxy.org/p/6650/
The thing is that I want to grep all this "sub-regions" in all my aligned reads. These are barcodes from an amplicon-seq experiment that have been mapped to my reference construct (which I use as reference "genome" in this case), so I need to know all the different barcodes that were generated...
so what about reading this information from the bam directly. How do you know which section of the reads is assigned to a barcode ?
BTW: it's a typical xy problem http://xyproblem.info/ :-)
Thank you for the kind help. Unfortunately I just changed fields from studying proteins to genomics and all these are new to me - sorry if my question was not very clear. I don't know how to do as you suggest from the BAM file actually. Regarding the second question, I actually know it because the reads were mapped to the NNNNNNN on my reference construct (I used the STAR aligner that allows for mapping to NNNNN). Therefore I am confident that if, in theory, I could print all the 2,5M reads, I would be able to grep the respective region (say pos 150-160 that I have my NNNNs) in each of them and using that to get all the different possible barcodes. Does that make more sense know (at least what I was planning to do)?
If there is an inline barcode in your reads you could separate them first before aligning them to anything.
It appears that your post has been cross-posted to another site: http://seqanswers.com/forums/showthread.php?p=212153
This is typically not recommended as it runs the risk of annoying people in both communities.
Hi all and sorry for the cross-posting. The way I see it, this is not doable then. What genomax kindly suggested I am not sure I understand. The barcodes are random, i.e. I have to prior information about them, that is why I used STAR to align my sequencing reads against my reference construct, and then, based on the position of the NNNNNs in it, grep the respective regions of the aligned reads. The approach would work perfectly, if not for the (apparent) limitation of 8,000 reads to be outputted...
Many thanks for your help! Yes, this is what I want to do..
I installed the bioalcidaejdk and I stored the code you wrote as script.js.
Then I try:
But I get lots of errors. Is this at least the correct way to do this or I misread your post?
How can I know ?
ADD COMMENT/ADD REPLYwhen responding to existing posts to keep threads logically organized.
This response should have gone under @Pierre's answer.
Hi Pierre and, again, many many thanks for the help!
Ok, here comes my script.js:
ref_cons is the name of my construct (as I said, my "reference genome" is only 1 sequence), 145 start 165 end.
what is your version of java/javac ?
openjdk version "1.8.0_131" OpenJDK Runtime Environment (build 1.8.0_131-8u131-b11-2ubuntu1.16.04.3-b11) OpenJDK 64-Bit Server VM (build 25.131-b11, mixed mode)
Is this a problem?
Hi Pierre and everyone!
I just wanted to let you know that it works! I tested it on another machine because I noticed you advise against using OpenJDK and it seems to be fetching the regions properly!
Many thanks for your help, time and code, much appreciated :)
cool. please mark my answer as "accepted" (green mark on the left)
I also wanted to load more reads for
samtools tviewand then I came across this post.
"maxcnt = 8000" to "maxcnt = 80000"in samtools-1.x.x/htslib-1.x.x/sam.c and it works for me.