Hi everyone,
I have fastq files from a RNA sequencing experiment; my samples are human cells infected with an intracellular pathogen, thus I would like to align the total reads on both genomes (human and pathogen). I am working on Linux and I have performed some standard alignments before, using STAR and Ensembl genome reference.
I read it is better to perform the alignment in one step rather than two separated steps. However, I can't figure out how to build the "hybrid" STAR reference genome; ideally, I would like to have an "hybrid" genome where the sequence of the pathogen looks like an additional chromosome at the end of the human genome.
For a standard alignment, I would use STAR in --runMode genomeGenerate
to build the reference; I can provide a "hybrid" fasta to STAR, obtained by concatenating fasta files from human and pathogen sequences (by simply using function cat
). Is it okay?
What about .gtf files? How should I handle them to build the reference (and to count the aligned reads after)?
Note: I downloaded both fasta files from NCBI (as the pathogen sequence is only available from this resource), and both gtf files as well.
I am completely new to these kind of tasks and to the command line, sorry if my question is badly formulated. Thanks to anyone who can help me through this!
Thank you both for your answers. It worked well!
A small educational note: if an answer was helpful, you should upvote it; if the answer resolved your question, you should mark it as accepted. You can accept more than one answer if they work. This will help future users that might find this post find the right answer.