Creating reference genome for mapping and then selecting
2
0
Entering edit mode
4 weeks ago
Dataminer ★ 2.7k

Dear community,

I will be analyzing PDx RNA-seq data and whatever information I could gather is that I need couple of things before I start:-

1. Combined reference genome of mouse and human (hg38 and mm10).

How can I generate this from hg38.fa and mm10.fa files?

1. Using combined reference genome for alignment using STAR.

What special features do I need to use so that only the reads that exclusively map to hg38 are selected and a Gene count can be generated.

Could anyone of you help me.

mm10 hg38 STAR RNA-seq • 503 views
0
Entering edit mode
0
Entering edit mode

the post has been removed.

2
Entering edit mode
4 weeks ago

The process is quite straightforward, simply concatenate the reference files, then index the resulting file.

You may need to rename the chromosomes (if for both organisms the naming is the same, i.e. chr1 then name the chromosomes for human genome as chr1_hg)

Once you perform the alignments you can easily select the uniquely mapped alignments by filtering for the chromosome (and flags).

samtools view -b -q 0 data.bam crh1_hg chr2_hg .... > filtered.bam


This resulting bam file can then be used in any downstream analysis.

0
Entering edit mode

Thank you for the tips. One more question, can something similar be done while using Salmon. Like merge the genome files and index them and then run Salmon mapper. But then again, how to get human specific counts. Many thanks in advance

1
Entering edit mode

you would get the counts as you would get them in any other case, how do you know something maps to chromosome 1 of human genome? Because it is called that way.

There is nothing special in mixing additional chromosomes into your genome, all the aligner does is uses the information (data) you give it, it "does not care" that you have to chromosome 1s, one from human and one from mouse.

The only thing that matters is that you have to be able tell them apart by name, if you call them both chr1 then you won't be able to tell which chr1 belongs to which organism.

2
Entering edit mode
4 weeks ago
GenoMax 101k

You should use tools that can bin the reads by aligning to multiple genomes at the same time.

bbsplit.sh from BBMap suite (BBSplit syntax for generating builds for the reference genome and how to call different builds. ) and XenofilteR (LINK) are a couple of examples.

bbsplt.sh allows you to handle reads that multi-map (within and across genomes) intelligently via ambiguous2= option.

ambiguous2=<best>   Set behavior only for reads that map ambiguously to multiple different references.
Normal 'ambiguous=' controls behavior on all ambiguous reads;
Ambiguous2 excludes reads that map ambiguously within a single reference.
best   (use the first best site)
toss   (consider unmapped)
all   (write a copy to the output for each reference to which it maps)
split   (write a copy to the AMBIGUOUS_ output for each reference to which it maps)