Question: samtools view region
0
gravatar for xd_d
5 months ago by
xd_d70
xd_d70 wrote:

Hello everybody,

it is possible to get only the reads if the start coordinates is in a region.

When i run this: samtools view cigarN.bam X:68206675-68208264

I get this:

 FCC0WRYACXX:1:1103:14931:24492 99  X   68204511    255 59M2530N30M =   68207133    2703    TGTTCTTATTATTATGGTATCATTTATTTGTAAAATTTATAATTGTAAAATACAAACTAGTAAGAACCCTATCAGTGTTGTTTTTGTTT   HHJJJJJJJJJJJIJJJFHIJJJJJIJJJJJIJJJJJJJJJJJIJJJJJJJJJJJJJIJJIIIJGGIIIJJJJJJEHFHHHFFFFDEEE   PG:Z:MarkDuplicates NH:i:1  HI:i:1  nM:i:1  AS:i:165

FCC0WRYACXX:1:1304:17536:3030   99  X   68206627    255 47M3479N42M =   68210195    3656    TTTCTCATACTGGCAATAGTATTTCACCCAGGATATTCCTAAAGCCCATTTCTCTTGTGTATAGAGATAGCCTTCAATAGTTGGCTGTC   HHJJJJIJJJJJJJJIIIJHIJJJIJJJJIJIIJIJJIJJIJJIIIJJJJJJJJJJJGHIGGHIIIJHHGEHFFFDEFFECFCEEEDDD   PG:Z:MarkDuplicates NH:i:1  HI:i:1  nM:i:0  AS:i:176

 FCC0WRYACXX:1:1106:15737:96571 83  X   68206628    255 46M3479N43M =   68201680    -8516   TTCTCATACTGGCAATAGTATTTCACCCAGGATATTCCTAAAGCCCATTTCTCTTGTGTATAGAGATAGCCTTCAATAGTTGGCTGTCC   CCCDDCCCDDDCCEDDEDCDEDB9GFFFFGGDGGIIGGEHFAAGFIGGCGDGFCHGIEGGIIHDBEEGIGIJGIIGIIJJGHIHGEFGH   PG:Z:MarkDuplicates NH:i:1  HI:i:1  nM:i:0  AS:i:179

But I want only these reads if the fourth column (start position of the reads) in the region. It is possible to get a solution with samtools ?

in my example the start position of the read FCC0WRYACXX:1:1103:14931:24492 is not in the region X:68206675-68208264.

data: RNA-seq paired-end

rna-seq samtools paired-end • 411 views
ADD COMMENTlink modified 5 months ago by dariober8.3k • written 5 months ago by xd_d70

No, it is not possible with samtools. You will need to write a script and work with the sam file for your purpose.

ADD REPLYlink written 5 months ago by novice730

Given the enormous amount of tools available it's very likely that something exists for this quite specific problem, without the need of writing a custom script. (Although I agree it could be a solution - it's for sure not the easiest for everyone).

ADD REPLYlink written 5 months ago by WouterDeCoster24k
2
gravatar for dariober
5 months ago by
dariober8.3k
Glasgow - UK
dariober8.3k wrote:

Would this work, without resorting on additional third party tools?

samtools view aln.bam X:100-1000 | awk -v FS='\t' '$4 >= 100 && $4 <= 1000'

If you want also the bam header:

 samtools view -h aln.bam X:100-1000 | awk -v FS='\t' '$1 ~ "^@" || ($4 >= 100 && $4 <= 1000)'

(Not tested)

ADD COMMENTlink written 5 months ago by dariober8.3k
2

I had this idea also!

ADD REPLYlink written 5 months ago by xd_d70

Good call. I'm using a variation of this command in order to pull out SNP genotype-specific reads in order to divide my BAM by region and genotype

ADD REPLYlink written 10 weeks ago by Kevin Blighe8.9k
1
gravatar for H.Hasani
5 months ago by
H.Hasani580
Freiburg, Germany
H.Hasani580 wrote:

How about running bedtools - intersect on your bam file. That should you give you what you want. I just noticed, there is already an example of exactly what you are after on that page.

ADD COMMENTlink modified 5 months ago • written 5 months ago by H.Hasani580

Sorry... Maybe I'm missing it. Could you point out at the example? I don't think you can tell bedtools "report feature if the start of a is in interval b". Of course you further parse the output but it would make things more complicated then necessary.

ADD REPLYlink written 5 months ago by dariober8.3k

-abam Default behavior when using BAM input (deprecated since 2.18.0)

it says deprecated (the -abam), but you still can use -a and -b as given in this example

ADD REPLYlink written 5 months ago by H.Hasani580
1

It says you can use bam as input, but you can't tell bedtools to output a read if the start is inside a target interval. If a read starts before the interval and ends somewhere inside or after the interval, bedtools will print it. The OP wants to discard such reads instead. Did I get it right?

ADD REPLYlink written 5 months ago by dariober8.3k

could you point out where did you read that? when it comes to intersection it does not matter which end is obtained by the given list of intervals (at least that's what they say specifically in case of paired-end reads).

ADD REPLYlink written 5 months ago by H.Hasani580

it does not matter which end is obtained by the given list of intervals

Exactly, so if the start of an interval (a read) is outside the interval, the OP wants to reject it. The OP wants to keep reads whose start is inside the interval. So if the read has start and end coordinates chr1:5-20 and the target interval is chr1:10-20, the read has to be rejected, but bedtools will instead return it.

By the way, I'm not considering paired-end reads here.

ADD REPLYlink written 5 months ago by dariober8.3k

Oh, I see your point now.

ADD REPLYlink written 5 months ago by H.Hasani580
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 966 users visited in the last hour