samtools piping with awk/ bash commands -> Wonky things happen!!!
0
0
Entering edit mode
7.3 years ago

My problem is so weird.

I'm using samtools with two windows like

$ samtools view in.bam 1:1000-2000 1:3000-4000

And it will print out paired ends with the first mate mapping to the first window and second window

When I do this

$ samtools view in.bam 1:1000-2000 1:3000-4000 | sort 

or this

$ samtools view in.bam 1:1000-2000 1:3000-4000 | sort | uniq 

It only prints out the first window?

When I do this

$ samtools view in.bam 1:1000-2000 1:3000-4000 | awk '{ if($7 == "=") { print $0 }}'

It will print the first window as such

HS2000-123    145   1   1005   60   100M   ...

and the second ...

HS2000-456 145 1 3509 60 100M ...

The first is tab delimited and the second is space delimited

When I pipe on sort | uniq to the awk command I only return the first window.

 

Am I going crazy? It feels like it. I could fix this is perl, but want to know why.

 

What the phread is going on here?

samtools awk bash delimiter enigma • 1.9k views
ADD COMMENT
4
Entering edit mode

Do you see the same behavior if you run \samtools instead of samtools?

If not, maybe some pre-defined aliases are disrupting your workflow.

ADD REPLY
1
Entering edit mode

This is very good suggestion.

ADD REPLY
1
Entering edit mode

Thank you :)

ADD REPLY
0
Entering edit mode

Thank you!

This did work, however I'm still getting this weird delimiter issue where some reads have tabs and others have spaces

 

ADD REPLY
0
Entering edit mode

So, this processes both files and writes out all windows but you still see the delimiter issue, is it? (Just clearing stuff up)

ADD REPLY
2
Entering edit mode

I can't reproduce this. I get both intervals regardless of piping

ADD REPLY
2
Entering edit mode

put it into a file and process the file - then you can test what happens - is it the samtools or the other commands that misbehave - you should not fix it in perl ... this should work on its own

it is alway the simplest of mistakes that produce the most mindbendingly mysterious errors

ADD REPLY
0
Entering edit mode

The point of me piping commands is so I can do everything without writing to files. I have over 1.5 million regions in 250 genomes to look for mate !

ADD REPLY
1
Entering edit mode

The point of writing to files here is to drill down and figure out what's wrong with the pipeline, so you can correct it and get the intended use out of the pipeline. That's how testing works. 

ADD REPLY

Login before adding your answer.

Traffic: 1366 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6