Question: Extract sequences from the raw reads
gravatar for Jusnib
11 months ago by
Jusnib0 wrote:

I need to extract promoter and gene sequences of few (5) genes from more than 100 soybean lines. However, I have only raw reads of the genome. Mapping the reads of all the lines to soybean genome will take very long time. Is there any other quick way to extract those sequences?

next-gen wgs • 428 views
ADD COMMENTlink modified 10 months ago by Biostar ♦♦ 20 • written 11 months ago by Jusnib0

To extract the gene/promoter sequences from you raw reads, you have to map them on some reference. Here reference does not mean genome all the time.

You can make your own customized reference database from the interested gene/promoter sequences and then using any sequence alignment tools (I would suggest short read aligner like BWA, bowtie and bowtie2 as you have raw sequencing reads), you can map your raw reads on such small customized database to your save time.

At the end of the alignment, you will get the gene/promoter sequences from your raw reads which are similar to the customized database(gene/promotor database).

ADD REPLYlink modified 11 months ago • written 11 months ago by Nitin Narwade420

Perhaps BLAST?

ADD REPLYlink written 11 months ago by goodez460

If I understand correctly, you have sequences for 5 genes and you want to extract all of the WGS reads that map to these genes. Am I correct? If so, BLAST is probably your best option. If the data are already in SRA then it would be even easier as you can use the web BLAST and use your gene sequence as query against the WGS SRA project as the subject database. If the data are not in SRA then you can run BLAST locally.

ADD REPLYlink written 11 months ago by vkkodali1.2k

Is there any particular reason why you don't want to assemble the reads first?

ADD REPLYlink written 11 months ago by mike-zx140

These are not my data, I got these raw reads from our collaborator. I need sequences of few genes and If possible, I would like to avoid spending time in assembling the reads. If there is no other way I will assemble the reads.

ADD REPLYlink written 11 months ago by Jusnib0

You could also pseudo-align to the FASTA mRNA sequences for the genes of interest using Kallisto or Salmon, produce a pseudobam from this pseudo-alignment, and then extra the reads that have aligned from the BAM. Be aware of the biases in these steps, though.

Otherwise, assemble the genome and generally follow steps by Nitin Narwade.

ADD REPLYlink written 10 months ago by Kevin Blighe49k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 748 users visited in the last hour