Question: alignment containing sequences from position a to b
0
gravatar for curiousbiologist
2.9 years ago by
France
curiousbiologist40 wrote:

Hello everyone,

I'm looking for an alignment containing reads started or clipped at the first position and ended or clipped at the last position of my alignment. I don't want to have reads ended in the middle of the sequence.

For that I've done an alignment from fastq input with bwa on my entire gene, then selected a region between 2 positions (samtools view -b trialsorted.bam AA_AD:30-230 > trialsorted30-230.bam) and then realign these sequences with a reference containing only the first bases of my start position and the same for the last bases of my end position in order to keep only reads of interest. However I didn't get a selection of sequences starting from a position and ending to another position.

Does anyone have any idea how to solve my problem?

Thanks

sequencing sequence alignment • 1.2k views
ADD COMMENTlink modified 2.9 years ago • written 2.9 years ago by curiousbiologist40
java -jar picard.jar FilterSamReads I=align-sort.bam O=out.bam JS=clippFilter.js FILTER=includeJavascript

with clippFilter.js containing:

function accept(r)
 {
 return r.getContig()=="AA_AD";
 }
accept(record);

Using this command, I got something, however this one

function accept(r)
 {
 return r.getContig()=="AA_AD" &&
  r.getStart()==30 &&
  r.getEnd()==130;
 }
accept(record);

does not allow me to crop sequences from 30 to 130, nothing pass through the filter. Using igv, I saw my input contain sequences that overlap this region.

ADD REPLYlink modified 2.9 years ago • written 2.9 years ago by curiousbiologist40
0
gravatar for Pierre Lindenbaum
2.9 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum123k wrote:

using samjs https://github.com/lindenb/jvarkit/wiki/SamJS

samtools view -b in.bam AA_AD:30-230 | java -jar dist/samjs.jar -e 'record.getContig()=="AA_AD" && record.getStart()==30 && record.getEnd()==230'
ADD COMMENTlink written 2.9 years ago by Pierre Lindenbaum123k

thank you for your rapid answer! I can't however run it with 'FilterSamReads'

ADD REPLYlink modified 2.9 years ago • written 2.9 years ago by curiousbiologist40

I still don't get it, how to run this command with FilterSamReads? I tried without success:

samtools view -b in.bam AA_AD:30-230 | java -jar FilterSamReads.jar -e 'record.getContig()=="AA_AD" && record.getStart()==30 && record.getEnd()==230'

java -jar FilterSamReads.jar INPUT=in.bam -e 'record.getContig()=="AA_AD:30-230" && record.getStart()==30 && record.getEnd()==230' OUTPUT=out.bam
ADD REPLYlink modified 2.9 years ago • written 2.9 years ago by curiousbiologist40

And what was the error displayed ? What did you see ?

ADD REPLYlink written 2.9 years ago by Pierre Lindenbaum123k

ERROR: Invalid argument '-e'

impossible to find something equivalent in the options

ADD REPLYlink modified 2.9 years ago • written 2.9 years ago by curiousbiologist40

"Invalid argument".

I will not give you the solution: the manual for this picard tool is here https://broadinstitute.github.io/picard/command-line-overview.html#FilterSamReads .

ADD REPLYlink written 2.9 years ago by Pierre Lindenbaum123k
1

by the way, nowadays, there is no such file "FilterSamReads.jar ", there is only "picard.jar". Your version of picard looks very old. What is the version ? current is 2.7.1

ADD REPLYlink written 2.9 years ago by Pierre Lindenbaum123k
1

Ok thank you! After having updated picard and carefully read of the manual, I ran this command: java -jar picard.jar FilterSamReads I="insorted.bam", O="outsorted.bam", READ_LIST_FILE="read_names.txt", FILTER="includeAligned"

with read_names.txt file containing: AA_AD:30-230 -e 'record.getContig()=="AA_AD:30-230" && record.getStart()==30 && record.getEnd()==230'

However, there is still something wrong, I don't understand what happen with insorted.bam (sorted and present in my folder): "ERROR 2016-11-22 08:54:11 FilterSamReads Failed to filter aa.bam, htsjdk.samtools.SAMException: Cannot read non-existent file: /home/admin/Bureau/insorted.bam, at htsjdk.samtools.util.IOUtil.assertFileIsReadable(IOUtil.java:338) at htsjdk.samtools.util.IOUtil.assertFileIsReadable(IOUtil.java:325) at picard.sam.FilterSamReads.doWork(FilterSamReads.java:210) at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:208) at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:95) at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:105)"

ADD REPLYlink written 2.9 years ago by curiousbiologist40
2

The error is quite clear. The file doesn't exist in the path specified. There is no /home/admin/Bureau/insorted.bam. Where is insorted.bam?

And what's up with the comma's and "" in your command? I can't find those in the manual.

ADD REPLYlink modified 2.9 years ago • written 2.9 years ago by WouterDeCoster41k

Thanks for opening my eyes!

ADD REPLYlink written 2.9 years ago by curiousbiologist40
java -jar picard.jar FilterSamReads I=insorted.bam O=outsorted.bam JS=clippFilter.js FILTER=includeJavascript

with -e 'record.getContig()=="AA_AD" && record.getStart()==30 && record.getEnd()==230' in clippFilter.js file. I however got a syntax error after the first " ' "

ADD REPLYlink written 2.9 years ago by curiousbiologist40

You're mixing everything. There is no such '-e' . This option '-e' is for another tool, not for picard. Your script in picard should be only:

record.getContig()=="AA_AD" && record.getStart()==30 && record.getEnd()==230
ADD REPLYlink written 2.9 years ago by Pierre Lindenbaum123k

that's what I've finally done (and also with a script using 'function'), no problem with 'record.getContig' but when I add 'record.getStart' and/or 'record.getEnd' I get no record, as mentioned in my post above. I've tried with other files, other positions without success Here is the start of my sorted bam:

ZUXCF:05542:10628   0   AA_AD   1   34  33S62M1D5M  *   0   0   GCTTCGTAAGCTGTCAGCCTCTCACGGTCCGCTATGTTTTTCAACCCGTATCTGAGCGGCGGCGTGACCGGCGGTGCGGTCGCGGGTGGCCGGCGCAGCG    >8828<<<9=>>>===<<8<<=<<>>9==8<==>@=C>C=3=?9?A8>>=?<<<><<<78808:<===9=9::6:===:>>>666+68/:6<8<<<==<>    NM:i:1  MD:Z:62^T5  AS:i:62 XS:i:0
ZUXCF:03174:13233   16  AA_AD   1   60  68S98M  *   0   0   ATCTAGTGCTGCATGTGTATTTTCTTTGTGATTTTGCTTCGTAAGCTGTCAGCCTCTCACGGTCCGCTATGTTTTTCAACCCGTATCTGAGCGGCGGTGTGACCGGCGGTGCGGTCGCGGGTGGCCGGCGTCAGCGTTCGCAGCCCGGCTCCGCGCAGGGCTCGGG  0)++++..,,-..<<<:::4=<<;6;;8887)7777;7=<<<8>==;;::770788<<883=<7<;;8888)888886;3;;;@====<7777770::;;;94<7994=</0,><99;6<<<77282<<<;;;;880888;;;7==9=<=9<<<:;82::<0/.?A  NM:i:1  MD:Z:29C68  AS:i:93 XS:i:0
ADD REPLYlink written 2.9 years ago by curiousbiologist40

so you don't have any read starting at 30...

ADD REPLYlink written 2.9 years ago by Pierre Lindenbaum123k

it's only the beginning of the file... My aim is not to select reads starting at a defined position and ending at a defined position but get an interval of sequences between two positions even if I have piece of reads

ADD REPLYlink written 2.9 years ago by curiousbiologist40
1

You may find a way to trim the aligned sequence at position 30, then do the same with 230 and remove all the part-aligned sequences using FilterSamReads record.getStart()==30 && record.getEnd()==230

ADD REPLYlink written 2.9 years ago by vmicrobio240

how can I trim aligned sequences at position 30 and 230?

ADD REPLYlink written 2.9 years ago by curiousbiologist40
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1486 users visited in the last hour