Question: aligning with bacterial genome
gravatar for HK
3.8 years ago by
HK40 wrote:

hey All,

i have few RNA seq samples (healthy and diseased). I have already aligned my samples with human hg19 reference using tophat2. Now i am trying to align these reads to the bacterial genome in order to know if the samples also have some bacterial genome in them or not. I need suggestions which tool should i use for this. i did try tophat2 but do you know any better one.

rna-seq bacterial-genome • 1.6k views
ADD COMMENTlink modified 24 months ago by predeus1.3k • written 3.8 years ago by HK40

Do you already know the particular bacteria that you want to align against, or are you still trying to determine that? For what it's worth, bacteria tend to not have splicing, so you can often get away with directly using bowtie2/bwa/etc.

ADD REPLYlink written 3.8 years ago by Devon Ryan94k

Yes, i am using streptococcus pneumonia ATCC700669 (FM211187). I downloaded the fasta file.. made the index file by bowtie-build and then mapped using tophat2. The result that i got for the diseased sample is :

Left reads:
          Input     :    164672
           Mapped   :       540 ( 0.3% of input)
            of these:       512 (94.8%) have multiple alignments (0 have >20)
Right reads:
          Input     :    164672
           Mapped   :      1119 ( 0.7% of input)
            of these:      1057 (94.5%) have multiple alignments (0 have >20)
Unpaired reads:
          Input     :     48331
           Mapped   :       148 ( 0.3% of input)
            of these:       141 (95.3%) have multiple alignments (0 have >20)
 0.5% overall read mapping rate.

Aligned pairs:         6
     of these:         6 (100.0%) have multiple alignments
                       2 (33.3%) are discordant alignments
 0.0% concordant pair alignment rate.

And for the healthy (was just exoerimenting with the healthy sample, what comes out)

Left reads:
          Input     :   1254299
           Mapped   :       843 ( 0.1% of input)
            of these:       156 (18.5%) have multiple alignments (0 have >20)
Right reads:
          Input     :   1254299
           Mapped   :      2527 ( 0.2% of input)
            of these:      1700 (67.3%) have multiple alignments (0 have >20)
 0.1% overall read mapping rate.

Aligned pairs:       719
     of these:        45 ( 6.3%) have multiple alignments
 0.1% concordant pair alignment rate.

By just looking at the result, do you say that the bacterial genome remain are into the sample???

ADD REPLYlink modified 3.8 years ago by Devon Ryan94k • written 3.8 years ago by HK40

I just saw this paper mentioned on twitter (it literally just came out). It and some of the references therein may be of interest to you. That particular paper is for one of the iobio tools, which are always really slick.

ADD REPLYlink written 3.8 years ago by Devon Ryan94k

Our internal threshold for calling a sample contaminated is 0.5% unique alignments, so I guess the diseased sample is borderline. I don't know where the samples were sourced from, so you might not expect a high amount of the bugs in the samples, even if the patient had them.

ADD REPLYlink written 3.8 years ago by Devon Ryan94k

you could use SNAP/Bowtie2 to align the reads against bacterial genomes from NCBI. There are pipelines built for this, but it would be tedious if your main goal is not to identify the pathogens in the data.

ADD REPLYlink modified 3.8 years ago • written 3.8 years ago by geek_y10k
gravatar for predeus
24 months ago by
predeus1.3k wrote:

Bacterial RNA would hardly be detectable in human RNA-seq library - if you do poly-A selection, bacterial reads won't be there since there's almost no poly-A tails, and total RNA would still not capture bacterial RNA since they decay too fast (you need a special protocol for bacterial RNA-seq). Also, there should not be any DNA in RNA-seq data, if it's done properly.

To answer your question though, you can use any good short read mapper (bwa, bowtie2) to align to bacterial genome, since there is no splicing. If you have no idea what bacteria you would expect to find, use Centrifuge/Kraken with nt database.

ADD COMMENTlink written 24 months ago by predeus1.3k

Poly A selection is quite common, but OP didn't mention his protocol. Also, poly-A selection biases your samples away from other human RNA species that don't have the tail. You don't need any special protocol for bacterial RNA-seq if you're working in an RNAse free environment. Therefore, bacterial sequences are (anecdotally) quite common in human samples that haven't been processed with care, or are taken from tissues with microbiota. OP is asking for a recommendation for an aligner for bacterial samples that doesn't concern itself with splicing (you can imagine the algorithmic mistakes a splice-aware aligner can make in the densely packed bacterial genomes). Aligning against bacterial genomes is a good quality control routine, especially when considering the prevalence of laboratory contaminants such as Mycoplasma.

From my experience, even with several rounds of DNAse treatment, if you are sequencing deeply, a residual noise in the genome can be observed that can be explained as either spurious transcription or DNA contamination that evades multiple DNAse rounds. This has been described in the following papers.

I'm guessing this is why you were downvoted for the 'quick-to-dismiss' fairly common sample contaminants including bacterial sequences and even DNA sequences.

ADD REPLYlink written 24 months ago by mrals8950
gravatar for Antonio R. Franco
3.8 years ago by
Spain. Universidad de Córdoba
Antonio R. Franco4.3k wrote:

Look information on BBSplit

ADD COMMENTlink written 3.8 years ago by Antonio R. Franco4.3k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1424 users visited in the last hour