Using Specific Regions of Genome as Reference for Alignment Using STAR
3
0
Entering edit mode
18 months ago
hkarakurt ▴ 180

Hello everyone, I have lots of transcriptome data (RNA-Seq and full transcript single cell RNA-Seq) and I need to align them to genome using STAR. But I have storage some problems.

I will focus on a group of genes in downstream analyses. Is it possible to align the reads to specific parts of genomes (gene group) while using STAR or creating a custom reference from genome fasta that includes only specific regions and use it as reference?

Thank you in advance

RNA-Seq Alignment STAR • 1.4k views
ADD COMMENT
3
Entering edit mode
18 months ago
GenoMax 142k

That is possible but the question is it appropriate to do so.

If your data comes from entire genome/transcriptome then using a reduced representation reference always leads to the chance that STAR will try to align things to a location they may not have originated from.

ADD COMMENT
1
Entering edit mode
ADD COMMENT
0
Entering edit mode

So doing such a thing in the alignment step practically causes mis-aligned reads. I will try to align all reads to whole genome and extract the regions using a bed file. I believe this will not create false positives or mis-aligned reads (at least not as much as the previous scenario).

Thank you for your answers.

ADD REPLY
0
Entering edit mode
18 months ago
Buffo ★ 2.4k

If the problem is the storage capacity, you can filter the bam file to those regions of interest, see this post: Extract Reads From A Bam File That Fall Within A Given Region.

ADD COMMENT
0
Entering edit mode

I would not do that. With a few genes alone you are not going to do a meaningful analysis, for example normalization and DE needs a fair amount of genes to be robust. Even more so on single-cell level for QC purposes. If storage is limited then do it file by file, get bam, then the count matrix for a single sample. Delete bam, next one. Eventually concat the matrices into a single one. Or use salmon for everything which produces counts directly.

ADD REPLY
0
Entering edit mode

I wouldn't do that either:

Is it possible to align the reads to specific parts of genomes (gene group) while using STAR

I only proposed an alternative to the problem assuming that further analysis doesn't need information about other regions:

But I have storage some problems.

That might be a better solution (you should post it as an answer to the question, not to my answer):

If storage is limited then do it file by file, get bam, then the count matrix for a single sample. Delete bam, next one.

ADD REPLY

Login before adding your answer.

Traffic: 1278 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6