Question

Alignment of Reconstructed Gene Sequences based Pool seq data an conserved regions

0

Entering edit mode

2.0 years ago

Human • 0

Hello folks,

I'm trying to create marker to distinguish the Sex of a species eventually. For this I need to do an alignment of the two genes which determine the Sex. Because the gene Sequence on NCBI seems to differ too much from the ones which is present in our PoolSeq samples, I 'm trying to reconstruct both genes based on 2 Conserved regions from one paper. Currenttly I 'm using this perl line to "puzzle" the sequence together step by step.

perl -lane ' if ($_  =~  " Conserved Sequence from literature ") {@array=split(" right part of the sequence "); print $array[1]} ;'

This goes through a PoolSeq fasta file and is supposed to search for a match and then prints everything right next to the match. Then its repeated with the new part of the sequence and so on. But after some steps I don't get any matches anymore.

Maybe anyone knows a better way to reconstruct the genes.

Perl gene alignment PoolSeq Reconstruct • 1.1k views

ADD COMMENT • link updated 24 months ago by Matthias Zepper 4.6k • written 2.0 years ago by Human • 0

0

Entering edit mode

Human Why did you delete this post?

ADD REPLY • link 2.0 years ago by Ram 43k

score 0 · Answer 1 · 2022-04-25

0

Entering edit mode

2.0 years ago

Matthias Zepper 4.6k

How long are the pieces in your PoolSeq Fasta file? Are this reads, longer contigs or fully assembled chromosomes? Do you have access to the reads from the PoolSeq in FastQ format?

I recommend having a look at the BBTools Suite, which comprises plenty of Tools that may be helpful, depending on the available reference and sequencing data. If there is already a reference genome or at least assembled contigs available, you could map your reads to those contigs and then generate scaffolds from them: lilypad.sh in=mapped.sam ref=contigs.fa out=scaffolds.fa. The scaffolds.fa will then contain the modified contigs according to your own data and you could use e.g. Blast to find your genomic regions/genes of interest.

Alternatively, if you wish not to rely on a reference, you can use Tadpole to assemble contigs from the sequencing. It should be a good compromise between the Perl matching and more sophisticated genome assembly methods. Subsequently, you could blast the NCBI reference/the conserved regions against the collection of those assembled Tadpole contigs and pull out those that match. BBTools has the estherfilter.sh script that can simplify the latter step.

Good luck!

ADD COMMENT • link 2.0 years ago by Matthias Zepper 4.6k

0

Entering edit mode

I have complete PoolSeq reads with assembled chromosomes of like 50 individuals in this files. I also have acces to them as FastQ and there is also a reference available for my organism. What would you reccoman in that case.

ADD REPLY • link 2.0 years ago by Human • 0

0

Entering edit mode

So you already have a Fasta file with the scaffolds from 50 individuals of that genomic region? In that case, a multiple-sequence alignment to the reference genome should show you how different your individuals truly are.

ADD REPLY • link 2.0 years ago by Matthias Zepper 4.6k

0

Entering edit mode

could you give me a quick advice how to use that package/program. I cannot get it running

ADD REPLY • link 2.0 years ago by Human • 0

0

Entering edit mode

What is your exact problem?

The Readme has a pretty comprehensive example section, but it can't be used to align very large sequences. Therefore, you need to prepare a FASTA file with the genomic region of interest (best is to creat two FASTA files for each of your two regions) and the sequences of all your individuals. You can use e.g. samtools faidx or bedtools getfasta to extract those.

ADD REPLY • link 2.0 years ago by Matthias Zepper 4.6k

0

Entering edit mode

Because I am an ultra beginner my Problem is super basic. Where do I actually run it? It's not a program that I can just download and install. The download-file just contains lots of (for me) weird files and the Readme file. I tried to run it in the Linux based cluster I'm working on and in the windows cmd terminal...

ADD REPLY • link 2.0 years ago by Human • 0

0

Entering edit mode

I think you downloaded the Source Code? In general, it doesn't harm to learn how to compile a software yourself, but for ClustalO, there are precompiled binaries available for both: Linux and Windows.

ADD REPLY • link 24 months ago by Matthias Zepper 4.6k