Separate Bam File
2
1
Entering edit mode
7.6 years ago
filipzembol ▴ 130

Dear all, I have one problem I need to count number of rows and GC content in bam file for individual segments, but I do not know, how to separate it. the size of segment is 60kb. Than for reads, which has start coordinates to 60000 in chr 1, I will receive the count of reads (rows) and GC content. Please could you help me. I have the source code for count NR in awk and GC content, but it is only for all reads in bam file. And I have some source code to separate bed file to each 60kb segments if it help you.

Count of GC

awk '{ n=length($10); print $10, gsub(/[GCCgcs]/,"",$10)/n;}'   your.sam

Count of reads (rows)

awk 'END {print NR}'

Separate bed file:

$ awk \
     '{ \
        regionChromosome = $1; \
         regionStart = $2; \
         regionStop = $3; \
         val = 1; \
          for ( baseStart = regionStart; baseStart < regionStop; baseStart += 60000 ) { \
            baseStop = baseStart + 60000; \
            print regionChromosome"\t"baseStart"\t"baseStop"\t"val++; \
          } \
      }' hg19.extents.bed \
    | bedmap --echo --echo-map-id --delim "\t" 1.bed - > answer.1.bed

Thank you so much.

bam coordinates awk perl • 3.2k views
ADD COMMENT
3
Entering edit mode
7.6 years ago

You might try using samtools:

samtools view my.bam chr1:60000-120000 | your_awk_command

This will pipe all reads that overlap with a region on chromosome 1 between 60kb and 120kb.

ADD COMMENT
0
Entering edit mode

Yes I understand you but iI need to do it for all bam file. It is mean, I separate bam file to each bam files, separate by chromosome (I receive 25 files) and every file I have to separate to each part. Than from 0-60kb; 60kb-120kb,120kb-180kb to the end and for this partitions count GC and fragments.

ADD REPLY
1
Entering edit mode

You can specify the chromosome you want:

samtools view my.bam chr1:60000-120000 | your_awk_command
samtools view my.bam chr1:120000-180000 | your_awk_command
samtools view my.bam chr1:180000-240000 | your_awk_command
...
samtools view my.bam chr2:60000-120000 | your_awk_command
samtools view my.bam chr2:120000-180000 | your_awk_command
samtools view my.bam chr2:180000-240000 | your_awk_command

etc.

Of course, I'd recommend creating a small script to do it, but you don't need to pull apart the bam file to operate on different chromosomes.

ADD REPLY
0
Entering edit mode

Ok, i understand of this, but I separate becasue for cycle will be for me easier than if I have all chrom together, i am totally beginner. I do not have any other idea to do this. Because if I will write all segments by hand, it is unpossible.

ADD REPLY
1
Entering edit mode

This is a great opportunity to learn some programming and scripting. I would suggest identifying a local "mentor" who can work with you to do so; doing so will be much more efficient than using biostars. If this is not something you feel comfortable with, then I think you might consider identifying a collaborator who can help you with your data, as data analysis is absolutely FULL of small tasks like this that require scripting.

ADD REPLY
0
Entering edit mode
7.6 years ago
filipzembol ▴ 130

Ok I create a code like this, which solve my problem, but still time I have one problem and I cannot finish it. I would like to create many outputs (files), which has name same like variable $h. Than I want one think, if the for cycle for "h" finished, the code create a text fie with same name like $h.txt . I tried this, but it doesnt work:

 #!/bin/bash 
 for h in {1..22..1} 
 do 
 for i in {0..249480000..60000}
 do 
 u=$i let "u +=60000"

 echo $i"-"$u | samtools view /home/filip/Desktop/Posledni\ NIFTY/019/odfiltrovany019.bam  chr$h:$i-$u | awk '{ n=length($10);print gsub(/[GCCgcs]/,"",$10)/n;}'| awk '{s+=$1}END{print NR,s/NR}' > ${h%}.txt

 done  
 done

Thank you for your help

ADD COMMENT
0
Entering edit mode

Best to ask a new question. You'll want to include your code as well as desired output and any errors you receive.

ADD REPLY

Login before adding your answer.

Traffic: 2639 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6