How to analyse paired-end Fastq reads
0
0
Entering edit mode
4.5 years ago

Hello, I'm a beginner in this field and I have 4 paired-end reads of parasite genome, I would like to start to analyse them to draw a Phylogenetic tree and to find some virulence genes correlated to the disease. Could anyone help and guide me, please. I installed and start using ubuntu for this job. Thanks, Mohamed

alignment • 1.2k views
ADD COMMENT
0
Entering edit mode

I think you should start by understanding what exactly you can do and what you want to do with your data. To do that you have to learn the basics of data bioinformatics data analysis, one good entry in this field can be found here : https://www.biostarhandbook.com/ Then you can find help on the forum by asking more specific questions regarding the differents steps of your analysis (it's difficult to help you without more details)

ADD REPLY
0
Entering edit mode

Thank you guillaume.

I already started to read this book.

Regards, Mohamed

ADD REPLY
1
Entering edit mode

Ok great, typically to do phylogenetic trees you'll have to work with variants detected in each sample you have, and to have variants you'll have to do a variant calling on aligned reads (bam format).

ADD REPLY
0
Entering edit mode

Exactly this what I want to do, I have a set of 46 bam files (mapped files to a reference), so I need to go through of them and find the varients.

So I need to know the exact tools I have to use to do such as Multiple sequence alignments and other processes...etc

Thank you for your help and guide

ADD REPLY
0
Entering edit mode
  • Dear Guillaume.rbt, Thank you for your help. Now I have VCf files for some of my isolates. How I can draw a Phylogenetic tree from them?
ADD REPLY
0
Entering edit mode

Hi Mohamed,

To do phylogenetic tree you'll have to create a fasta file from your vcf file. In your fasta file you want to have a sequence for each isolate, made of only detected SNPs (all sequences must be the same length, with each SNPs at the same position in each sequence), for this I use the software PGDspider.

I advise you to filter for high quality variants, maybe try only with variants in coding regions, and to keep only SNPs and not indels.

Once you have a fasta file you can use it for generating a phylogenetic tree, you can do this with online tools, for example here : https://www.phylogeny.fr

(if you have other questions you should make a new post on the forum, you'll have other answers, and help people having the same issue)

ADD REPLY
0
Entering edit mode

OK.

Thank you very much for your help.

Sure I will create a new post for any further question.

ADD REPLY
0
Entering edit mode

Bioinformatics is really broad and there are many specialties. As some one already answered, the question is not specific enough. Maybe people can guide you for now by telling you the subject you need to look at.

I am not sure but for this question I think you need to know things about:

  • Doing an assembly, reference based or denovo
  • Read something about genome annotation and orf prediction
  • blast
  • data formats fasta, fastq, sam
  • Multiple sequence alingments
  • Tool to create a Phylogenetic tree (not the theory and algoritms behind it)
ADD REPLY
0
Entering edit mode

Yes but I need to know the tools for these steps. Doing an assembly, reference based or denovo Read something about genome annotation and orf prediction blast data formats fasta, fastq, sam Multiple sequence alingments Tool to create a Phylogenetic tree (not the theory and algoritms behind it)

ADD REPLY

Login before adding your answer.

Traffic: 2450 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6