samtools piping with awk/ bash commands -> Wonky things happen!!!
0
0
Entering edit mode
7.8 years ago

My problem is so weird.

I'm using samtools with two windows like

$samtools view in.bam 1:1000-2000 1:3000-4000 And it will print out paired ends with the first mate mapping to the first window and second window When I do this$ samtools view in.bam 1:1000-2000 1:3000-4000 | sort

or this

$samtools view in.bam 1:1000-2000 1:3000-4000 | sort | uniq It only prints out the first window? When I do this$ samtools view in.bam 1:1000-2000 1:3000-4000 | awk '{ if($7 == "=") { print$0 }}'

It will print the first window as such

HS2000-123    145   1   1005   60   100M   ...

and the second ...

HS2000-456 145 1 3509 60 100M ...

The first is tab delimited and the second is space delimited

When I pipe on sort | uniq to the awk command I only return the first window.

Am I going crazy? It feels like it. I could fix this is perl, but want to know why.

What the phread is going on here?

bash samtools awk • 2.1k views
4
Entering edit mode

Do you see the same behavior if you run \samtools instead of samtools?

If not, maybe some pre-defined aliases are disrupting your workflow.

1
Entering edit mode

This is very good suggestion.

1
Entering edit mode

Thank you :)

0
Entering edit mode

Thank you!

This did work, however I'm still getting this weird delimiter issue where some reads have tabs and others have spaces

0
Entering edit mode

So, this processes both files and writes out all windows but you still see the delimiter issue, is it? (Just clearing stuff up)

2
Entering edit mode

I can't reproduce this. I get both intervals regardless of piping

2
Entering edit mode

Put it into a file and process the file. Then you can test what happens - is it the samtools or the other commands that misbehave? You should not fix it in perl, this should work on its own

It is always the simplest of mistakes that produce the most mind-bendingly mysterious errors

0
Entering edit mode

The point of me piping commands is so I can do everything without writing to files. I have over 1.5 million regions in 250 genomes to look for mate !

1
Entering edit mode

The point of writing to files here is to drill down and figure out what's wrong with the pipeline, so you can correct it and get the intended use out of the pipeline. That's how testing works.