Is there a limit for SAMTOOLS tview option?
1
0
Entering edit mode
5.4 years ago
bioplanet ▴ 60

Hi all,

I am new to the genomics field and I have recently started working with SAMTOOLS. My question is the following:

1. I have my SAM file with 2,5 M aligned reads.
2. I convert it to BAM file with SAMTOOLS
3. I make the index of the BAM and the reference file

What I want to do is use the tview command to study a region of interest (say position 150-160). So I write this:

samtools tview -d T -p ref_cons:145 BAM_FILE --reference REFERENCE_FILE

It works fine, BUT the problem is that it returns only ~8,000 lines (reads). My question then is if there is some limit of the number of alignments that are outputted and, if so, how can I make it so that it prints all of them? I attach a screen-shot of the output I am getting.

Thank you!

alignment • 4.2k views
1
Entering edit mode

ok, let's see this in a different way: WHY would you need to visualize more than 8K rows ?

0
Entering edit mode

Maybe it is because of hard-coded depth limit in samtools. Please see here: https://biostar.usegalaxy.org/p/6650/

0
Entering edit mode

The thing is that I want to grep all this "sub-regions" in all my aligned reads. These are barcodes from an amplicon-seq experiment that have been mapped to my reference construct (which I use as reference "genome" in this case), so I need to know all the different barcodes that were generated...

0
Entering edit mode

so what about reading this information from the bam directly. How do you know which section of the reads is assigned to a barcode ?

BTW: it's a typical xy problem http://xyproblem.info/ :-)

1
Entering edit mode

Thank you for the kind help. Unfortunately I just changed fields from studying proteins to genomics and all these are new to me - sorry if my question was not very clear. I don't know how to do as you suggest from the BAM file actually. Regarding the second question, I actually know it because the reads were mapped to the NNNNNNN on my reference construct (I used the STAR aligner that allows for mapping to NNNNN). Therefore I am confident that if, in theory, I could print all the 2,5M reads, I would be able to grep the respective region (say pos 150-160 that I have my NNNNs) in each of them and using that to get all the different possible barcodes. Does that make more sense know (at least what I was planning to do)?

0
Entering edit mode

If there is an inline barcode in your reads you could separate them first before aligning them to anything.

0
Entering edit mode

Hello bioplanet!

This is typically not recommended as it runs the risk of annoying people in both communities.

0
Entering edit mode

Hi all and sorry for the cross-posting. The way I see it, this is not doable then. What genomax kindly suggested I am not sure I understand. The barcodes are random, i.e. I have to prior information about them, that is why I used STAR to align my sequencing reads against my reference construct, and then, based on the position of the NNNNNs in it, grep the respective regions of the aligned reads. The approach would work perfectly, if not for the (apparent) limitation of 8,000 reads to be outputted...

0
Entering edit mode

Hi Pierre!

Many thanks for your help! Yes, this is what I want to do..

I installed the bioalcidaejdk and I stored the code you wrote as script.js.

Then I try:

samtools view -h input.bam "reference:150-300" | java -jar dist/bioalcidaejdk.jar -f script.js -F bam


But I get lots of errors. Is this at least the correct way to do this or I misread your post?

0
Entering edit mode

But I get lots of errors

How can I know ?

0
Entering edit mode

Please use ADD COMMENT/ADD REPLY when responding to existing posts to keep threads logically organized.

This response should have gone under @Pierre's answer.

0
Entering edit mode

Hi Pierre and, again, many many thanks for the help!

Ok, here comes my script.js:

    final String chrom = "ref_cons";
final int start=145;
final int end=165;

filter(R->R.getContig().equals(chrom) && !(R.getEnd()<start || R.getStart()>end)).
forEach(R->{

for(int pos=start; pos< end ;++pos)
{
char base='.';
int ref=R.getStart();
for(CigarElement ce:R.getCigar())
{
CigarOperator op=ce.getOperator();
switch(op)
{
case P: break;
case H: break;
case D: case N: ref+=ce.getLength();break;
case EQ:case X: case M:
{
for(int i=0;i< ce.getLength() ;i++)
{
if(ref==pos)
{
break;
}
if(ref>pos) break;
ref++;
}
break;
}
default:break;
}
if(base!='.')break;
}
print(base);
}
println();
});


ref_cons is the name of my construct (as I said, my "reference genome" is only 1 sequence), 145 start 165 end.

0
Entering edit mode

Errors:

         [DEBUG][BioAlcidaeJdk] Compiling :
1  import java.util.*;
2  import java.util.stream.*;
3  import java.util.function.*;
4  import htsjdk.samtools.*;
5  import htsjdk.samtools.util.*;
6  import htsjdk.variant.variantcontext.*;
7  import htsjdk.variant.vcf.*;
8  import com.github.lindenb.jvarkit.util.bio.fasta.FastaSequence;
9  import javax.annotation.Generated;
10  import htsjdk.variant.vcf.*;
11  @Generated(value="BioAlcidaeJdk",date="2017-10-26T16:35:29+0200")
12  public class BioAlcidaeJdkCustom1850364251 extends com.github.lindenb.jvarkit.tools.bioalcidae.BioAlcidaeJdk.SAMHandler {
13    public BioAlcidaeJdkCustom1850364251() {
14    }
15    @Override
16    public void execute() throws Exception {
17     // user's code starts here
18     final String chrom = "ref_cons";
19  final int start=145;
20  final int end=165;
21
23      filter(R->R.getContig().equals(chrom) && !(R.getEnd()<start || R.getStart()>end)).
24      forEach(R->{
25
26      for(int pos=start; pos< end ;++pos)
27          {
28          char base='.';
29          int ref=R.getStart();
31          for(CigarElement ce:R.getCigar())
32              {
33              CigarOperator op=ce.getOperator();
34              switch(op)
35                  {
36                  case P: break;
37                  case H: break;
38                  case S : read+=ce.getLength(); break;
39                  case D: case N: ref+=ce.getLength();break;
41                  case EQ:case X: case M:
42                      {
43                      for(int i=0;i< ce.getLength() ;i++)
44                          {
45                          if(ref==pos)
46                              {
48                              break;
49                              }
50                          if(ref>pos) break;
52                          ref++;
53                          }
54                      break;
55                      }
56                  default:break;
57                  }
58              if(base!='.')break;
59              }
60          print(base);
61          }
62      println();
63  });
64
65      //user's code ends here
66     }
67  }
[SEVERE][InMemoryCompiler]ToolProvider.getSystemJavaCompiler() failed.
java.lang.RuntimeException: ToolProvider.getSystemJavaCompiler() failed.
at com.github.lindenb.jvarkit.lang.InMemoryCompiler.compileClass(InMemoryCompiler.java:165)
at com.github.lindenb.jvarkit.tools.bioalcidae.BioAlcidaeJdk$AbstractHandlerFactory.getConstructor(BioAlcidaeJdk.java:473) at com.github.lindenb.jvarkit.tools.bioalcidae.BioAlcidaeJdk$SAMHandlerFactory.execute(BioAlcidaeJdk.java:564)
at com.github.lindenb.jvarkit.tools.bioalcidae.BioAlcidaeJdk.doWork(BioAlcidaeJdk.java:796)
at com.github.lindenb.jvarkit.util.jcommander.Launcher.instanceMain(Launcher.java:1162)
at com.github.lindenb.jvarkit.util.jcommander.Launcher.instanceMainWithExit(Launcher.java:1320)
at com.github.lindenb.jvarkit.tools.bioalcidae.BioAlcidaeJdk.main(BioAlcidaeJdk.java:821)
[SEVERE][BioAlcidaeJdk]java.lang.RuntimeException: Cannot compile custom class BioAlcidaeJdkCustom1850364251

0
Entering edit mode

what is your version of java/javac ?

java -version

0
Entering edit mode

openjdk version "1.8.0_131" OpenJDK Runtime Environment (build 1.8.0_131-8u131-b11-2ubuntu1.16.04.3-b11) OpenJDK 64-Bit Server VM (build 25.131-b11, mixed mode)

Is this a problem?

0
Entering edit mode

Hi Pierre and everyone!

I just wanted to let you know that it works! I tested it on another machine because I noticed you advise against using OpenJDK and it seems to be fetching the regions properly!

Many thanks for your help, time and code, much appreciated :)

0
Entering edit mode

cool. please mark my answer as "accepted" (green mark on the left)

0
Entering edit mode

I also wanted to load more reads for samtools tview and then I came across this post.

changed "maxcnt = 8000" to "maxcnt = 80000" in samtools-1.x.x/htslib-1.x.x/sam.c and it works for me.

2
Entering edit mode
5.4 years ago

Ok, I'm not sure If I've understood but the followings script for bioalcidaejdk ( http://lindenb.github.io/jvarkit/BioAlcidaeJdk.html ) should print the read sequences in a given region of the genome (indels are skipped!)

final String chrom = "rotavirus";
final int start=150;
final int end=200;

filter(R->R.getContig().equals(chrom) && !(R.getEnd()<start || R.getStart()>end)).
forEach(R->{

for(int pos=start; pos< end ;++pos)
{
char base='.';
int ref=R.getStart();
for(CigarElement ce:R.getCigar())
{
CigarOperator op=ce.getOperator();
switch(op)
{
case P: break;
case H: break;
case D: case N: ref+=ce.getLength();break;
case EQ:case X: case M:
{
for(int i=0;i< ce.getLength() ;i++)
{
if(ref==pos)
{
break;
}
if(ref>pos) break;
ref++;
}
break;
}
default:break;
}
if(base!='.')break;
}
print(base);
}
println();
});


usage:

samtools view -h input.bam "rotavirus:150-300" | java -jar dist/bioalcidaejdk.jar -f script.js -F bam

TAATTCCCAGAGTTCAAAGTAAAAATGATAATGTGATGGATGACTCTGTT
TATTTACCAGATTTAAAAGTAACTTTGATTATGTGATGGATGACTCTGGT
TATTAACCAGAGTTACAAGTAAATTTGATTATGTGCTGGATGACTCTGGT
TATTTCCCAGAGATAAACGTCAATTTGATTCTGTGATGGATGACTCTGGT
TATTTACCAGAGTTAAAAGTAAATTTGATTCTGTGATGGATTCCTCTGGT
TATTTACCAGAGTTAAAAGTAAATTTGATTCTGTGATGGATGACACTGGT
TATTTACCAGAGATAAAAGTAAATTTGATTATGTGATGGATGACTCTGGT
TATTTACCAGAGTTAAAAGTAAATTAGATTATTTGATGGATGACTCTGGT
TATTAACCAGAGTTAAAAGTCAATTTGATTCTGTGATGGATGACTCAGGT
TATTTACCAGAGTTAAAAGTAAATTTGATAATTTGATGGATGACTCAGGT
TATTTACCAGATTTAAAAGTAAATTTGAATATGTGAAGGCTGACTCTGGT
TATATACCAGAGATAAAAGTAAATTTGATAATGTGATGGATGACTCTTGT
TATTTAGCAGATTACAAAGTAAATTTGATTATGTTATGGAAGACTGTGTT
TATTTACCAGATTTAAAAGTAAATTTGAATATGAGATTGATGACTCTGGA
TATTTACCAGAGTTAAAAGTAAATTTGATTATGTTATGGATGACTCTGGT
TATTTACCATAGTTAAAAGTAAATTTGATTATGAGATGGATGACTCTGGT
TATTTACGCGAGTTAAAAGTAAATTTGATTATGAGATGGATGACTCTGGT
TATTTACCAGAGTTAAAAGTAAATTTGATAATGAGATGGATGACTCTTGT
TATTTACCATAGTAAAAAGTACATTTGATTATGTGATGGATGACTCTGGT
TATTAACCAGAGTTAAAAGTAAATTTGATTAAGAGATGGATGACTCTGGT
TATTTACCAGAGTTAAACTTAAATTTGATTATGTTATGGATGACTCTGGT
TATTTACCAGCGATACAAGTAACTTTTATTATGTGAAGGATGACTCTGGT
TATTAACCATAGTTAAAAGTCAATTTGATTATGTGAAGGCTGACTCTGGT
TATTTACCAGAGTTAAACGTAAATTTGATTATGTGATGGATGACTCTGGT
TATTTACCAGAGTTAAAAGTAAATTTGATTATGAGATGGATTACTCTGGT
TATTAACCAGAGTTAAAAGTAAATTTGATTCTGTGATGGATGACTGTGGT
TATTTACCAGAGTTAAAAGAAAATTAGATTATGTGCTGGCTGCCTCTGGT
TATTTACCAGAGTTAAAAGTAAATTTGATTATGTGATGGATGACTCAGGT
TATTTACCAGAGTTAAAAGTAAATTTGATTATGTGATGGCTGACTCTGGT
TATTTACCAGATTTACAAGTAAATTTGATAATGTGATGGATGACTCTGGA
TATTTCCCATAGTTAAAAGTAAATTTGATTAAGTGATGGCTGACTCTGGT
(...)

0
Entering edit mode

No I mean if I am following your steps correctly... Was I supposed to save the script as script.js and give it as argument?

My command is:

samtools view -h MY_BAM_FILE "ref_cons:145-160" | java -jar jvarkit/dist/bioalcidaejdk.jar -f script.js -F bam

0
Entering edit mode

yes, modify the chrom, start, end variables in script.js. Show me the errors.

0
Entering edit mode

Hi again,

I was wondering, if with some modification I can print not only the region but also the whole read (e.g. with a tab separating them). Is it maybe somehow possible with a modification in the JAVA script? Sorry to ask, but I do not program in JAVA so I cannot understand the script well.

1
Entering edit mode

replace the last

println();


with

println("\t"+R.getReadString());

0
Entering edit mode

Thank you Pierre, it works!

I think that I get only the sequence of the read though, is it perhaps possible to caption the whole read, like ID etc?

1
Entering edit mode
0
Entering edit mode

Thank you so much for the link and all the help!