Question: Finding gene matches using sequence reads?
0
gravatar for anabaena
7 months ago by
anabaena0
anabaena0 wrote:

Hey all, I am looking through large data sets to try and find a gene cluster or particular genes using sequence reads. The goal is to find a metagenomic set with the genes of interest, essentially just a probe to see if what I am looking for is there. Currently, I am using Bowtie2 but I was wondering if there were any other ways to go about this? And if Bowtie2 is the best option, are there particular parameters I should consider/set when doing the alignment?

Thanks!

metagenomic bowtie2 reads • 217 views
ADD COMMENTlink modified 7 months ago by Fatima610 • written 7 months ago by anabaena0
1

It might be a good idea to assemble the reads and use a tool like Fraggenescan (it's really fast) to predict the gene sequences and then you can use blast to search them against your genes of interest.

The following pipeline might give you some ideas:

https://experiments.springernature.com/articles/10.1007/978-1-4939-7015-5_3

ADD REPLYlink modified 7 months ago • written 7 months ago by Fatima610

Awesome! So my main concern is that I have many samples to look through. The initial thought was to probe a metagenomic sample/read set and look for the gene of interest, and if present then go in an assemble reads and give the sample a deeper look. So with that being said would this pipeline be efficient in doing this?

ADD REPLYlink written 7 months ago by anabaena0

I have a question regarding Fraggenescan, do you need to train it? The documentation seems a little sparse on it

ADD REPLYlink written 7 months ago by anabaena0
1

No, you don't need to train. You may use one of the trained models based on the data that you're using. You may get better results if assemble your reads before using Fraggenescan.

Since you already have your genes of interest maybe you don't need to use FragGeneScan to predict the reads, you can directly use a read mapper (Bowtie2/HISAT2). But if you're interested in predicting the genes in a metagenome or single genome or reads (Prokaryotic samples) you can use Fraggenescan.

For reference genomes you can use

-complete=1 -train=complete

For Illumina sequencing reads with about 0.5% error rate you can use

-complete=0 -train=illumina_5

Usage:

./run_FragGeneScan.pl -genome=[seq_file_name] -out=[output_file_name]
-complete=[1 or 0] -train=[train_file_name] (-thread=[number of thread; default 1])

Parameters

   [seq_file_name]:    sequence file name including the full path
   [output_file_name]: output file name including the full path
   [1 or 0]:1 if the sequence file has complete genomic sequences
   0 if the sequence file has short sequence reads
    [train_file_name]: file name that contains model parameters; this file should be in the "train" directory
    Note that four files containing model parameters already exist in the "train" directory
    [complete] for complete genomic sequences or short sequence reads without sequencing error
    [sanger_5] for Sanger sequencing reads with about 0.5% error rate
    [sanger_10] for Sanger sequencing reads with about 1% error rate
    [454_10] for 454 pyrosequencing reads with about 1% error rate
    [454_30] for 454 pyrosequencing reads with about 3% error rate
    [illumina_5] for Illumina sequencing reads with about 0.5% error rate
    [illumina_10] for Illumina sequencing reads with about 1% error rate
    [num_thread]:       number of thread used in FragGeneScan. Default 1.

Please let me know if you have any other questions.

ADD REPLYlink modified 7 months ago • written 7 months ago by Fatima610

Perfect, I think I'll just stick with bowtie2 then since I already know what I am looking for. One question that I have is that I am using the entire cluster sequence, and this cluster may only have a few core proteins that are conserved. Is it recommended to use the -local parameter with bowtie2 give this? I've only really used bowtie2 with a full reference genome.

ADD REPLYlink written 7 months ago by anabaena0

Yes, I think --local option sounds good since reads are short.

 --local            local alignment; ends might be soft clipped (off)
ADD REPLYlink modified 7 months ago • written 7 months ago by Fatima610

You don't need to use the whole pipeline (Fun4me) . You can just use Fraggenescan to predict the gene sequences and then use blast.

Please see the last line of this link (examples of Fraggenescan results using metagenomes as input):

https://omics.informatics.indiana.edu/FragGeneScan/result.php

I haven't seen your data so I'm not sure what's the best approach.

ADD REPLYlink modified 7 months ago • written 7 months ago by Fatima610
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1348 users visited in the last hour