Reads alignment on human and pathogen combined genome
2
0
Entering edit mode
4 weeks ago
di4mond • 0

Hi everyone,

I have fastq files from a RNA sequencing experiment; my samples are human cells infected with an intracellular pathogen, thus I would like to align the total reads on both genomes (human and pathogen). I am working on Linux and I have performed some standard alignments before, using STAR and Ensembl genome reference.

I read it is better to perform the alignment in one step rather than two separated steps. However, I can't figure out how to build the "hybrid" STAR reference genome; ideally, I would like to have an "hybrid" genome where the sequence of the pathogen looks like an additional chromosome at the end of the human genome.

For a standard alignment, I would use STAR in --runMode genomeGenerate to build the reference; I can provide a "hybrid" fasta to STAR, obtained by concatenating fasta files from human and pathogen sequences (by simply using function cat). Is it okay? What about .gtf files? How should I handle them to build the reference (and to count the aligned reads after)?

Note: I downloaded both fasta files from NCBI (as the pathogen sequence is only available from this resource), and both gtf files as well.

I am completely new to these kind of tasks and to the command line, sorry if my question is badly formulated. Thanks to anyone who can help me through this!

linux STAR alignment • 193 views
ADD COMMENT
3
Entering edit mode
4 weeks ago
GenoMax 102k

Simply catting the files may cause issues if the second file has headers. It may be best to use these directions: How to merge two gff3 files?

ADD COMMENT
0
Entering edit mode

Thank you both for your answers. It worked well!

ADD REPLY
0
Entering edit mode

A small educational note: if an answer was helpful, you should upvote it; if the answer resolved your question, you should mark it as accepted. You can accept more than one answer if they work. This will help future users that might find this post find the right answer.

upvote_bookmark_accept

ADD REPLY
1
Entering edit mode
4 weeks ago
Ram 33k

You should also be able to cat the GTF files as long as the contig (chromosome) names are different between species.

ADD COMMENT

Login before adding your answer.

Traffic: 2419 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6