GNU parallel on multiple files to re-header bam files
1
1
Entering edit mode
3.3 years ago

Hi All,

The purpose of my question is to generate a new header for a bam files. For one sample, i defined two command lines and its works:

#filter header
samtools view -H file1.bam | awk '(/^@SQ/ && /chrM/) || (!/^@SQ/) {print $0} ' > file1_filtered.txt

#re-header
cat file1_filtered.txt <(samtools view file1.bam) | samtools view -hb - > file1_filtered_re-header.bam

I would like to do the same thing for a lot of samples

#filter header
ls *.bam | parallel -j 8 'samtools view -H -@ 7 {} > {.}_filtered.txt'

After this step i didn't find the solution, if someone has an idea? Many thanks in advance.

GNU parallel • 1.5k views
ADD COMMENT
1
Entering edit mode

Keep in mind that working with files in parallel may or may not be faster than doing each independently. If the processing is IO-bound, then running in parallel may not be faster (though not likely slower).

ADD REPLY
0
0
Entering edit mode

Curious if you have done any recent testing with new high performance file systems such as isilon/NetApp (versions with spinning disks and pure SSD's)? At some point one of the connections (pci-e bus) may become the bottleneck.

ADD REPLY
0
Entering edit mode

I have not, but I am pretty sure the conclusion will be the same: It depends, and you should therefore test with different parallelization and measure.

ADD REPLY
0
Entering edit mode

Hi,

if it works with the command lines, why not putting these into a bash script and running this bash script with parallel?

Otherwise, you may use a tab-separated file where you have input and output variables stored like here.

Cheers, Michael

ADD REPLY
0
Entering edit mode

thanks for your help. I solve the issues for my first step "#filtered". I didn't find a solution to run the step "re-header" for a multiple files...

Many thanks for your help, Pierre

ADD REPLY
0
Entering edit mode

Hi all, Thanks a lot for your help and script, it's run perfectly. I better understand where was my mistake. I just discovered GNU parallel, I'm a bit confused with some syntaxes, i'm gonna to learn more about it.

Thanks again, Pierre

ADD REPLY
0
Entering edit mode

If an answer was helpful you should upvote it, if the answer resolved your question you should mark it as accepted (green check mark). Upvote|Bookmark|Accept

ADD REPLY
6
Entering edit mode
3.3 years ago
ole.tange ★ 4.0k
doit() {
  file="$1"
  out="$2"
  (samtools view -H $file | awk '(/^@SQ/ && /chrM/) || (!/^@SQ/) {print $0} ';
   samtools view $file) | samtools view -hb - > $out
}
export -f doit
parallel doit {} {.}_filtered_re-header.bam ::: *.bam

Or if you want to use the header from the first file for all the rest:

samtools view -H file1.bam | awk '(/^@SQ/ && /chrM/) || (!/^@SQ/) {print $0} ' >header
doit() {
  file="$1"
  out="$2"
  (cat header;
   samtools view $file) | samtools view -hb - > $out
}
export -f doit
parallel doit {} {.}_filtered_re-header.bam ::: *.bam
ADD COMMENT

Login before adding your answer.

Traffic: 1484 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6