Question: SAM file size after STAR alignment
1
gravatar for xqyjxau
21 months ago by
xqyjxau20
xqyjxau20 wrote:

My RNA-Seq data is in format of fastq(ungzipped from fastq.gz format).I used STAR 2.5.3a mapping the reads with already indexed reference genome. It seems good. But I found the size of generated SAM file is strange. My original input fastq data is like 1.3-1.5 GB, but the SAM file ranges from 3.8 GB to 4.5 GB. Is that normal? If not, what is something wrong there?

rna-seq alignment • 1.3k views
ADD COMMENTlink modified 21 months ago by Istvan Albert ♦♦ 80k • written 21 months ago by xqyjxau20

Have you checked the STAR logs to see if there were any errors generated and to see what the alignment percentages looked like? If not the resulting SAM file should be fine.

ADD REPLYlink written 21 months ago by genomax65k

The data uniquely mapped is from 63%-64%, multiply mapped reads are from 27% to 32%.Is this OK?

ADD REPLYlink written 21 months ago by xqyjxau20
1

There are so many variables here it's impossible to say. If you didn't get an error then presumably you're fine. The file size increase is perfectly normal. But by the sounds of things you really should look into pairing-up with someone who knows what is going on to teach you the ropes :)

ADD REPLYlink written 21 months ago by John12k
4
gravatar for Istvan Albert
21 months ago by
Istvan Albert ♦♦ 80k
University Park, USA
Istvan Albert ♦♦ 80k wrote:

A SAM file will typically be larger than a FASTQ file because, in general, it contains all the information of the FASTQ plus a lot of other information.

In addition, each FASTQ record may produce more than one alignment, hence you can see how it could easily grow to be much larger than the original data.

ADD COMMENTlink written 21 months ago by Istvan Albert ♦♦ 80k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1498 users visited in the last hour