Create Count Data Out Of Sam File
Entering edit mode
12.2 years ago
lsvijfhuizen ▴ 90

Dear All,

At the moment I am working on a mouse SAGE data project. The first analysis step was to align the reads against a reference genome and count per gene the number of reads. That generates a count data table for each mouse. I compared the difference in counts between genes for our wild-type mouses (5) with our mutant mouses (6) to see if a gene is differentially expressed between these mouse models.

Now i want to run the same analysis only with a transcriptome as reference. I generate a SAM file, and now i am wondering if there is a easy way to count unique transcripts in the SAM file and report this as a count data file.

Hope that it is clear to you, Thank you!


sam transcript reference • 5.9k views
Entering edit mode

OFF TOPIC: Just out of curiosity, if your transcriptome has all available isoforms (or transcripts) of a gene, then how do you distinguish reads that fall in the identical portion of the two isoforms? Wouldn't those reads map to multiple positions? How do you resolve this?

Entering edit mode
12.2 years ago

You can do something like:

cut -f 3 transcipts.sam | sort | uniq -c > transcripts_counts.txt

(I don't remember off the top of my head if field 3 is the right one for a .sam file)

That will select out only column 3 of the .sam file, the sort will sort that list, and uniq -c will output a list of every unique entry, and how many times it was in that list.

Better would be for you to stay out of .sam format all together, and do:

samtools view transcripts.bam | cut -f 3 | sort | uniq -c > transcripts_counts.sam

Login before adding your answer.

Traffic: 2647 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6