Question: SAM file size after STAR alignment
1
gravatar for xqyjxau
2.6 years ago by
xqyjxau20
xqyjxau20 wrote:

My RNA-Seq data is in format of fastq(ungzipped from fastq.gz format).I used STAR 2.5.3a mapping the reads with already indexed reference genome. It seems good. But I found the size of generated SAM file is strange. My original input fastq data is like 1.3-1.5 GB, but the SAM file ranges from 3.8 GB to 4.5 GB. Is that normal? If not, what is something wrong there?

rna-seq alignment • 1.9k views
ADD COMMENTlink modified 2.5 years ago by Istvan Albert ♦♦ 82k • written 2.6 years ago by xqyjxau20

Have you checked the STAR logs to see if there were any errors generated and to see what the alignment percentages looked like? If not the resulting SAM file should be fine.

ADD REPLYlink written 2.6 years ago by genomax76k

The data uniquely mapped is from 63%-64%, multiply mapped reads are from 27% to 32%.Is this OK?

ADD REPLYlink written 2.6 years ago by xqyjxau20
1

There are so many variables here it's impossible to say. If you didn't get an error then presumably you're fine. The file size increase is perfectly normal. But by the sounds of things you really should look into pairing-up with someone who knows what is going on to teach you the ropes :)

ADD REPLYlink written 2.6 years ago by John12k
5
gravatar for Istvan Albert
2.5 years ago by
Istvan Albert ♦♦ 82k
University Park, USA
Istvan Albert ♦♦ 82k wrote:

A SAM file will typically be larger than a FASTQ file because, in general, it contains all the information of the FASTQ plus a lot of other information.

In addition, each FASTQ record may produce more than one alignment, hence you can see how it could easily grow to be much larger than the original data.

ADD COMMENTlink written 2.5 years ago by Istvan Albert ♦♦ 82k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1894 users visited in the last hour