Question

SAM file size after STAR alignment

1

Entering edit mode

8.0 years ago

xqyjxau ▴ 50

My RNA-Seq data is in format of fastq(ungzipped from fastq.gz format).I used STAR 2.5.3a mapping the reads with already indexed reference genome. It seems good. But I found the size of generated SAM file is strange. My original input fastq data is like 1.3-1.5 GB, but the SAM file ranges from 3.8 GB to 4.5 GB. Is that normal? If not, what is something wrong there?

RNA-Seq alignment • 6.2k views

ADD COMMENT • link updated 8.0 years ago by Istvan Albert 102k • written 8.0 years ago by xqyjxau ▴ 50

0

Entering edit mode

Have you checked the STAR logs to see if there were any errors generated and to see what the alignment percentages looked like? If not the resulting SAM file should be fine.

ADD REPLY • link 8.0 years ago by GenoMax 152k

0

Entering edit mode

The data uniquely mapped is from 63%-64%, multiply mapped reads are from 27% to 32%.Is this OK?

ADD REPLY • link 8.0 years ago by xqyjxau ▴ 50

1

Entering edit mode

There are so many variables here it's impossible to say. If you didn't get an error then presumably you're fine. The file size increase is perfectly normal. But by the sounds of things you really should look into pairing-up with someone who knows what is going on to teach you the ropes :)

ADD REPLY • link 8.0 years ago by John 13k

score 5 · Accepted Answer · 2017-06-30

5

Entering edit mode

8.0 years ago

Istvan Albert 102k

A SAM file will typically be larger than a FASTQ file because, in general, it contains all the information of the FASTQ plus a lot of other information.

In addition, each FASTQ record may produce more than one alignment, hence you can see how it could easily grow to be much larger than the original data.

ADD COMMENT • link 8.0 years ago by Istvan Albert 102k