Reads alignment on human and pathogen combined genome
2
0
Entering edit mode
5 months ago
di4mond • 0

Hi everyone,

I have fastq files from a RNA sequencing experiment; my samples are human cells infected with an intracellular pathogen, thus I would like to align the total reads on both genomes (human and pathogen). I am working on Linux and I have performed some standard alignments before, using STAR and Ensembl genome reference.

I read it is better to perform the alignment in one step rather than two separated steps. However, I can't figure out how to build the "hybrid" STAR reference genome; ideally, I would like to have an "hybrid" genome where the sequence of the pathogen looks like an additional chromosome at the end of the human genome.

For a standard alignment, I would use STAR in --runMode genomeGenerate to build the reference; I can provide a "hybrid" fasta to STAR, obtained by concatenating fasta files from human and pathogen sequences (by simply using function cat). Is it okay? What about .gtf files? How should I handle them to build the reference (and to count the aligned reads after)?

Note: I downloaded both fasta files from NCBI (as the pathogen sequence is only available from this resource), and both gtf files as well.

I am completely new to these kind of tasks and to the command line, sorry if my question is badly formulated. Thanks to anyone who can help me through this!

linux STAR alignment • 326 views
3
Entering edit mode
5 months ago
GenoMax 108k

Simply catting the files may cause issues if the second file has headers. It may be best to use these directions: How to merge two gff3 files?

0
Entering edit mode

0
Entering edit mode

A small educational note: if an answer was helpful, you should upvote it; if the answer resolved your question, you should mark it as accepted. You can accept more than one answer if they work. This will help future users that might find this post find the right answer.

1
Entering edit mode
5 months ago
Ram 34k

You should also be able to cat the GTF files as long as the contig (chromosome) names are different between species.