Help: Read Sam File and find chromosome
2
0
Entering edit mode
7.2 years ago

Dear helping community,

I received a large SAM file together with the task to send a file back containing all the reads of Chromosome 1.

I have never seen or worked with any SAM files – What do I do, how do I read them and, most importantly, how do I sort them by chromosomes?

I have some experience with Python and would like to use it for this task... Any suggsestions?

Thanks so much for your help T.M.

SAM BAM Python genome sequence • 4.4k views
ADD COMMENT
0
Entering edit mode

Perhaps this post would help.

ADD REPLY
1
Entering edit mode
7.2 years ago
Gabriel R. ★ 2.9k
  1. Download samtools: http://www.htslib.org/ and install it
  2. Transform your SAM file to BAM ( samtools view -Sb aln.sam > aln.bam )
  3. Sort the BAM filesamtools sort -T /tmp/aln.sorted -o aln.sorted.bam aln.bam
  4. Index it: samtools index aln.sorted.bam
  5. Extract the reads that align to chromosome 1: samtools view aln.sorted.bam chr1 (if chr1 is the name, could be just "1" or something else)

In general, learn to work with BAM files which is the binary/compressed version of SAM files, you will save a lot of time.

ADD COMMENT
1
Entering edit mode
7.2 years ago
DVA ▴ 630

Gabriel's method is pretty standard. You can refer to more details about sam format here. However, if you have to use Python, try something like the following:

samFile="/yourDir/test.sam"
chr1Output="/yourDir/test_chr1.sam"
with open(samFile,"r") as i, \
open(chr1Output,"w") as o:
    for line in i:
        if line[0]=="@":                          #avoid processing the first few lines start with @
            continue
        else:
            item=line.split("\t")
            chr=item[2]
            #your REF column may only use numbers, without "chr", so you may need to edit this line below
            if chr=="chr1":       
                o.write(line)

Note that this python code should be able to output all chr1 reads for you, but it does not sort the positions on chr1. If you would like to do that in python, one way I can think of is to use python dictionary. Use positions as the key for each line, and then sort via key.

ADD COMMENT

Login before adding your answer.

Traffic: 1471 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6