Counting Isoforms from Sam File
0
0
Entering edit mode
8 months ago
serodyc ▴ 20

I am attempting to count the number of reads of each isoform of the Rt-GEF gene in Drosophila across multiple sam files. My sam files are currently formatted so that reads are listed by coordinates on chromosomes, such as

VH00562:14:AAANWG5HV:1:1101:26828:1568 1:N:0:ACTAAGAT+GCGGTTGT  99  chrX    5075295 42  50=     =   5075328 83  CTTTTAAAAAAAAATCAATACCTTACTTAAACTAACTATGCAAAAAATCG  CCCCC;CCCCCCCCCCCCCCCCCCCCCCCCCCCC-CCCCCCCCCCCCCCC  NM:i:0  AM:i:42
.

However, I am trying to get my data in a format that converts these to the isoform expressed by the read, ideally using a FlyBase ID so that I can count them. Is there any way to change the files to a more readable format? Thanks!

Samtools • 392 views
ADD COMMENT
0
Entering edit mode

Isoforms typically share most of its exonic content. Hence, just because a read covers an exon of an isoform it does not mean that this isoform is "expressed". Most reads per gene are actually ambiguous in terms of which isoform they map to. Imagine an exon is perfectly shared between two isoforms and the read maps to that exon -- you cannot tell which isoform it comes from. A better way is to use tools like salmon or kallisto which quantify in isoform/transcript modeand then use an EM algorithm under the hood to decide which isoforms are actually expressed.

ADD REPLY

Login before adding your answer.

Traffic: 1383 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6