Question: How to split SAM file to different chromosomes
0
gravatar for javanokendo
5 months ago by
University of Cape Town
javanokendo0 wrote:

I have a SAM file which I now want to split into different chromosome from chromosome chr1..chr20 as follows.

/home/SpliceGraph/Human/chr1.sam
/home/SpliceGraph/Human/chr2.sam/
/home/SpliceGraph/Human/chr3.sam/
/home/SpliceGraph/Human/chr4.sam/
/home/SpliceGraph/Human/chr5.sam

Which command can I use to do this?

rna-seq assembly • 286 views
ADD COMMENTlink modified 11 weeks ago by Biostar ♦♦ 20 • written 5 months ago by javanokendo0

How To Split A Bam File By Chromosome

ADD REPLYlink written 5 months ago by ATpoint40k

The following code: samtools idxstats out.bam | cut -f1 | grep -v '*' > chr.names is not giving the list of chromosomes. It give something like this:

NC_000001.11
NT_187361.1
NT_187362.1
NT_187363.1
NT_187364.1
NT_187365.1
NT_187366.1
NT_187367.1
NT_187368.1
NT_187369.1
NC_000002.12
NT_187370.1
NT_187371.1
NC_000003.12
NT_167215.1
NC_000004.12
NT_113793.3
NC_000005.10
NT_113948.1
NC_000006.12
NC_000007.14
NC_000008.11
NC_000009.12
NT_187372.1
NT_187373.1
NT_187374.1
NT_187375.1
NC_000010.11
NC_000011.10
NT_187376.1
NC_000012.
ADD REPLYlink modified 5 months ago by genomax91k • written 5 months ago by javanokendo0

Those are the chromosome (reference) names that are present in your BAM file.

Please use ADD COMMENT/ADD REPLY when responding to existing posts to keep threads logically organized. SUBMIT ANSWER is for new answers to original question.

ADD REPLYlink modified 5 months ago • written 5 months ago by genomax91k

Please use ADD COMMENT to reply to answers. It gives you the names that were in the fasta file that you mapped against. The command itself is correct. If you want different names then align against a fasta file that contains the names you would like.

ADD REPLYlink modified 5 months ago • written 5 months ago by ATpoint40k
1
gravatar for ATpoint
5 months ago by
ATpoint40k
Germany
ATpoint40k wrote:
## Convert SAM to BAM and index it:
samtools view -o out.bam in.sam
samtools index out.bam

## Extract chromsosome names:
samtools idxstats out.bam | cut -f1 | grep -v '*' > chr.names

## Split bam file with w while loop
while read p
  do
  samtools view -o out_${p}.bam out.bam ${p}
  done < chr.names

If you really want SAM instead of BAM files then use samtools view -ho out_${p}.sam out.bam ${p}. Given you have the resources you can of course use something like GNU parallel instead of a loop to make it more efficient.

ADD COMMENTlink modified 5 months ago • written 5 months ago by ATpoint40k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1241 users visited in the last hour