Question: RNASeq with mixed tissues
0
gravatar for ddzhangzz
3.3 years ago by
ddzhangzz90
United States
ddzhangzz90 wrote:

I got some RNASeq fastq data from a customer, and he told me the samples were mainly from human cell lines but may have some contamination with mouse cells. My question is whether I should align those sequences against both human genome reference and mouse genome reference or just humna's. Any suggestions?

rna-seq • 1.4k views
ADD COMMENTlink modified 3.3 years ago by Manuel Landesfeind1.2k • written 3.3 years ago by ddzhangzz90
3
gravatar for informatics bot
3.3 years ago by
United States
informatics bot570 wrote:
  1. First align all the samples to the human genome.
  2. Then align the un-mapped reads to the mouse genome.

If you get a large portion of (un-mapped) reads mapping to mouse, then it's very likely the sample was contaminated.

ADD COMMENTlink modified 3.3 years ago • written 3.3 years ago by informatics bot570

Thanks @Lando Ringel. One problem could be that (maybe very likely) a sequence was actually from mouse but it can be mapped to both human and mouse.

ADD REPLYlink written 3.3 years ago by ddzhangzz90

That is true, but many of the mouse reads will remain un-mapped, you can use BLAST (or SNAP) to look at the unmapped reads more closely (i.e. determine which organism they belong to).

Do you plan on trying to using the contaminated samples? I personally would advise against that.

ADD REPLYlink written 3.3 years ago by informatics bot570

In this setting wouldn't t make more sense to align against a conjoined human/mouse reference, or to separately align to both human and mouse and select the species origin of the reads based on the quality of alignment in sp1 vs sp2

ADD REPLYlink written 3.3 years ago by russhh4.6k
3
gravatar for Devon Ryan
3.3 years ago by
Devon Ryan91k
Freiburg, Germany
Devon Ryan91k wrote:

First subset the files (seqtk) and then use fastq_screen to get an idea what the contamination rate is. I've found it useful to only pay close attention to the "single alignment in a single organism" (or whatever that's called) category, since the others are more an indicator of sequence complexity. I happen to do this with all sequencing runs produced at our institute, since it immediately allows us to flag problematic samples (anything over 0.5% off-species unique alignment is a problem).

Ideally you won't have much contamination and if you do you can just exclude the sample. If you can't exclude the sample, then you'll need to simultaneously align to both genomes (get one from Ensembl and the other from UCSC, so the chromosome names differ, and then concatenate them). Align against the concatenated genome and then extract only the human reads with some meaningful MAPQ threshold. One can get more elegant with this, but that should suffice 99.9% of the time.

ADD COMMENTlink written 3.3 years ago by Devon Ryan91k
1
gravatar for genomax
3.3 years ago by
genomax69k
United States
genomax69k wrote:

BBSplit from BBMap has been designed to address this kind of a situation for binning reads (to best extent they can be assigned by alignment). It is a one step process.

ADD COMMENTlink modified 3.3 years ago • written 3.3 years ago by genomax69k

yes, but after I use BBsplit, I obtain the fastq file and then I can to remap this with STAR but the count? FeatureCount doesn't work well with bbsplit. So my question is: After BBsplit What can I use to map and to calculate the count? Thanks

ADD REPLYlink written 9 weeks ago by giovannaventola3es920

FeatureCount doesn't work well with bbsplit.

There should be no direct relation with the splitting. If you reads are not aligning to exons then you can have issues with counting (assuming you are using a reference/annotation where chromosome ID's match).

ADD REPLYlink written 9 weeks ago by genomax69k

When I use featureCount I get reads % that map to exons that are too low, while STAR percentages are greater than 80%, Assuming the same annotation file. WHY?

ADD REPLYlink written 9 weeks ago by giovannaventola3es920

Can you describe the processing you are doing step wise for both BBTools and STAR?

As I said before there is no technical reason why this should not work with reads that have been bbsplit.

ADD REPLYlink modified 9 weeks ago • written 9 weeks ago by genomax69k
bbsplit.sh in1=reads1.fq in2=reads2.fq ref=human.fa,mouse.fa ambiguous2=toss basename=out_%.fq refstats=Statistics_%.txt

then I used directly STAR:

STAR --runMode alignReads --runThreadN 12 --genomeDir Genomes/STARversion2.70f/STARversion2.70f_INDEX_HG38_GenCode_v29/ --sjdbGTFfile Genomes/GenCode_HG38_release_29/GTF/gencode.v29.chr_patch_hapl_scaff.annotation.gtf --readFilesIn 2_Esperimento/6_BBSPlit/1_CTRL/out_HG38.fq --outSAMtype BAM SortedByCoordinate --outFileNamePrefix 1_CTRL/

then, I used FeatureCount:

featureCounts -F GTF -T 12 -s 2 -g gene_name -a Genomes/GenCode_HG38_release_29/GTF/gencode.v29.chr_patch_hapl_scaff.annotation.gtf -o GeneName_Count.txt 1_CTRL/Aligned.sortedByCoord.out.bam

In this case I obtained only the 11.3% of successfully assigned alignments. WHY?

Can you help me?

ADD REPLYlink modified 9 weeks ago by genomax69k • written 9 weeks ago by giovannaventola3es920

Is this library "reverse stranded" for sure? What happens if you try -s 0 or -s 1 with featureCounts? Does the assignment % go up significantly?

Just to be redundant, let me make sure you have tried three different things:

  1. Use bbsplit.sh to split the reads into two pools.
  2. Use bbmap.sh to map the reads back to human genome (what is the alignment % here?)
  3. Use featureCounts to count.

Other workflow

  1. Use bbsplit.sh to split the reads into two pools.
  2. Use STAR to map to genome and count at the same time?

Alternative that you have tried:

  1. Use bbmap.sh to map the reads to human genome.
  2. Use featureCounts to count.
ADD REPLYlink modified 9 weeks ago • written 9 weeks ago by genomax69k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 619 users visited in the last hour