How to split SAM file to different chromosomes
1
0
Entering edit mode
4.0 years ago
javanokendo ▴ 60

I have a SAM file which I now want to split into different chromosome from chromosome chr1..chr20 as follows.

/home/SpliceGraph/Human/chr1.sam
/home/SpliceGraph/Human/chr2.sam/
/home/SpliceGraph/Human/chr3.sam/
/home/SpliceGraph/Human/chr4.sam/
/home/SpliceGraph/Human/chr5.sam

Which command can I use to do this?

Assembly RNA-Seq • 1.4k views
ADD COMMENT
0
Entering edit mode
ADD REPLY
0
Entering edit mode

The following code: samtools idxstats out.bam | cut -f1 | grep -v '*' > chr.names is not giving the list of chromosomes. It give something like this:

NC_000001.11
NT_187361.1
NT_187362.1
NT_187363.1
NT_187364.1
NT_187365.1
NT_187366.1
NT_187367.1
NT_187368.1
NT_187369.1
NC_000002.12
NT_187370.1
NT_187371.1
NC_000003.12
NT_167215.1
NC_000004.12
NT_113793.3
NC_000005.10
NT_113948.1
NC_000006.12
NC_000007.14
NC_000008.11
NC_000009.12
NT_187372.1
NT_187373.1
NT_187374.1
NT_187375.1
NC_000010.11
NC_000011.10
NT_187376.1
NC_000012.
ADD REPLY
0
Entering edit mode

Those are the chromosome (reference) names that are present in your BAM file.

Please use ADD COMMENT/ADD REPLY when responding to existing posts to keep threads logically organized. SUBMIT ANSWER is for new answers to original question.

ADD REPLY
0
Entering edit mode

Please use ADD COMMENT to reply to answers. It gives you the names that were in the fasta file that you mapped against. The command itself is correct. If you want different names then align against a fasta file that contains the names you would like.

ADD REPLY
1
Entering edit mode
4.0 years ago
ATpoint 82k
## Convert SAM to BAM and index it:
samtools view -o out.bam in.sam
samtools index out.bam

## Extract chromsosome names:
samtools idxstats out.bam | cut -f1 | grep -v '*' > chr.names

## Split bam file with w while loop
while read p
  do
  samtools view -o out_${p}.bam out.bam ${p}
  done < chr.names

If you really want SAM instead of BAM files then use samtools view -ho out_${p}.sam out.bam ${p}. Given you have the resources you can of course use something like GNU parallel instead of a loop to make it more efficient.

ADD COMMENT

Login before adding your answer.

Traffic: 2941 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6