What do I need to do in order to use multiple different reference genomes when aligning RNA reads using STAR?
2
0
Entering edit mode
4.2 years ago
chaco001 ▴ 40

Hello,

My lab recently finished an experiment in which a community of three bacterial species were grown together and then their RNA was Illumina sequenced. From my perspective, this means that I have .fastq files containing reads from all three species. The plan is thus to parse these species out during alignment. I have quality reference genomes for each of the three species.

I have used STAR previously to align to a single bacterial genome, which worked great. However, while STAR can take multiple .fasta files for the reference, it can only accept a single .gtf annotation. I'm wondering what to do. I see two main possibilities:

  1. do three separate alignments on each set of data, with each different alignment having a different reference genome. The downside here is that many of the genes in these species are well (but not perfectly) conserved, and so I think it is likely that this will result in many false positives, where reads are assigned to the wrong species. I guess this could be dealt with down-stream, by finding all reads which mapped to more than one reference, and 'giving' them to the species with the highest alignment confidence, but this seems messy.

  2. I think that the better option is to figure out a way to combine the gtfs from my three reference species into a single gtf. I have not ever done this before. It seems like cellranger (https://github.com/10XGenomics/cellranger) seems to do this, but I can't find reviews of the package. Is it as simple as doing a cat command, then scrolling through the result to delete the headers of the second and third gtf?

Has anyone else combined gtfs successfully for this purpose: to align community-derived RNA-Seq reads against multiple prokaryote reference genomes? I see there is a similar question here from five years ago, which has no answers: Combining Gtf Files

Thank you!

community RNA-Seq bacteria alignment index • 2.7k views
ADD COMMENT
0
Entering edit mode

Isn't that (technically speaking) the same as a meta-transcriptomic analysis (in this case with 3 species)? I would check how people from this field typically align their RNA-seq data.

ADD REPLY
0
Entering edit mode

That's a good point. I will look into that as well.

ADD REPLY
0
Entering edit mode

Hi Chaco001

How did you perform the Analysis? Can you please explain it?

I have RNA Seq data for Host and parasite together and I want to map eliminate reads mapping to both genome to reduce over quantification.

Technically it's called dual RNA seq analysis. Here is a link from the reference paper I m the following.

https://stm.sciencemag.org/content/scitransmed/suppl/2018/06/25/10.447.eaar3619.DC1/aar3619_SM.pdf

ADD REPLY
2
Entering edit mode
4.2 years ago

Though I'm unsure as to the answer, I can tell you cellranger is not it, as it's meant for single-cell RNA-seq.

ADD COMMENT
0
Entering edit mode

Ha, thanks, well, ruling things out is always useful!

ADD REPLY
2
Entering edit mode
4.2 years ago

Is it as simple as doing a cat command, then scrolling through the result to delete the headers of the second and third gtf?

Yes (unless you have to make the chromosome name or gene names unique). Do this, forget about cellranger.

ADD COMMENT
0
Entering edit mode

Thanks--I do need to make the gene names unique. Presumably I could just pre-pend them by a species identifier, although this negates the ability to just use cat. I could open the files in python and go line-by-line in a loop, but that seems foolish too. Do you have any suggestions on how to re-name the genes?

ADD REPLY
0
Entering edit mode

I think prepending is going to be simpler. A loop would work. R might be able to do it quickly too, since R easily imports tables and you can manipulate a while column at once.

ADD REPLY

Login before adding your answer.

Traffic: 1997 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6