Question: How to perform sequence validation of NGS data?
1
gravatar for abdul.suboor123
8 months ago by
Huazhong Agricultural University, China
abdul.suboor1230 wrote:

I have done analysis of circDNA NGS data, now I am validating the circDNA sequence reads with sanger sequence, but this is my 5th time I am trying on notepad++, in "find" option I put the junction site "five bases from start and five bases from end" of my circDNA sequence read, but it doesn't match. Please tell me the better option to validate the sequence. Thanks.

circdna-seq • 449 views
ADD COMMENTlink modified 8 months ago • written 8 months ago by abdul.suboor1230
1

I think the best way would be to use Python or Perl to manage your text file as you need to.

Please, could you provide example of what you want to do ?

ADD REPLYlink written 8 months ago by Bastien Hervé4.4k

My objective is to validate my circDNA data, If there is any other way please tell me. Can you tell me about the script use for validation through python or perl?

ADD REPLYlink written 8 months ago by abdul.suboor1230

What do you mean by validate your circDNA ?

Could you post an example of what you are currently trying :

I put the junction site "five bases from start and five bases from end" of my circDNA sequence read

ADD REPLYlink written 8 months ago by Bastien Hervé4.4k

For example: I have sequenced my circDNA samples of maize crop, I have analysed the data with two softwares CIRI and CIRCexplorer2 and it resulted that I have 150 circDNAs from CIRI and 188 from CIRCexplorer results. Now I want to confirm that if the sequence of circDNA is really a maize line sequence or not, for this purpose through bedtools i visulalize the reads and then I took some junciton reads from the circDNA results, and prepare primers with the help of SnapGene and primer3 plus, I ran PCR with primers and control DNA, perform the sanger sequencing. Now after this I search with notepad++ about the juncntion site presence in the sanger sequence, but I haven't found junction site sequence.

ADD REPLYlink written 8 months ago by abdul.suboor1230

I want to confirm that if the sequence of circDNA is really a maize line sequence or not

Did you try using blast ?

This part need a full explanation (note that notepad++, wordpad... should be avoid to read large files as fastq) :

Now after this I search with notepad++ about the juncntion site presence in the sanger sequence, but I haven't found junction site sequence.

Which junction site ? What size ? What do you want to do with these junctions ?

ADD REPLYlink written 8 months ago by Bastien Hervé4.4k

I have tried blast and it shows that the sequence more than 90% belong to maize genome, but I want to validate its cicularity, whether it is circled or not??I know that the sequence is right, but acutally need to confirm the circularity of the DNA sequence which resulted after the application of the above two software. I am confused because my professor and my lab mate she applied it on circRNA, they told me to apply this validation method. And the junction site it the site where the two ends ofthe sequence meet and creat a circle,

ADD REPLYlink written 8 months ago by abdul.suboor1230
1

As an example, if let's say, you know that :

Start of the expected circular DNA : AAAAA

End of the expected circular DNA : CCCCC

Read1 : GCTATATAAAAACCCCCGCTAGCGT

Read2 : AAAAAGCATGCTAGCTATTACCCCC

Read3 : GCATGCAAAAACGTATGCTACCCCC

Read4 : GTCAGTCGATCGATGCGTGTCCCCC

Read5 : AAAAAGTCAGTCGATCGATGCGTGT

Read1 is circular, others aren't ?

ADD REPLYlink modified 8 months ago • written 8 months ago by Bastien Hervé4.4k

Yes you are right, if AAAAACGCGCGCGCCCCC is circDNA sequence, with snapgene when select the circular option, it create a junction site, means it connect the start and end of the circDNA sequence, i.e. junction site from the sequence is CCCCCAAAAA. These ten basis I want to confirm from the sanger sequence but how that's what I want to know????

ADD REPLYlink written 8 months ago by abdul.suboor1230

Could you share example data (sanger sequence and output from snapgene)

ADD REPLYlink written 8 months ago by Bastien Hervé4.4k

For Example the SnapGene Sequence is:

CTTGAAGTTATTGATAACATACTCTTAAAAATGACTGAGGAAGAATCTGCTGTGGCCGCTGCTAGCACAGGCACTGAAAAGGGGAAAAAACAAGCTGAAGACATTTTGGAGGGTGAAGATTTCGAATTTCAAGATCTACTTGGGCAAGAGCTGACAGACGCTGAAAAAGCAGAGCTTAAAAGATGTGCCATAGCCTGCGGATATAAGCCAGGGGCTACACTATTTGGTGGGGTTAACGAAGGAAAGCTGAGGTGCCTTCGAAACCGCAGCGAAGCTAAAATTGTTAGAACTCTCTGCAAAAACATAGGCTTGCCAAAGCTGGAAGTGGACCTCTGTCGTTACCAATGGCACCATATCGCCGGAAGTTTGCTTTATGCTAACTTCAAGGTAACAAATATTTTTGCTATTTTATTATTATCTTTTAGGTCGTTTTCTAACGAAGGTCTTTTCGACAGAGCATACTGTTAAGTAAAGTTCTTAAAATGCAACAAGATCTCGAAGAAGAGAAGAACAAAGCCATAATCCAAAATTTGGCTGAAAAGGTTGAAAATTACGAAGCTGATCTGAAAAAGAAGGATTTCACCATCCAAAGCTTCTCGGGGCACAGCCACCGCTTTTCTGAAGGCTGGCTGCACGCATGGAAATATTGTGAACAGACCAAACTTCAGCTTGTCAGCATCAGATCTGATAAATATCCCAAGCCTAGCCCGAAGCATCGGGAATAGATTCATGACCCAAATCTGGGTAAGTGGCGGGCGAAAAATGGCGGGTGACGAAGCTCGAAGTCACCTTAAGCTGGTAAGAAAC

The Junction Site : GAAACCTTGA in this junction site you can see the first 5 is from the end, and 5 base from the start of the SnapGene sequence.

Sanger Sequence:

GGGGACTTTACTATGCTCTGAGTCATTGATATTGAGCTTCGAAATGACAAGGCCTTGTGCCTATGCAGCTGGGCTCCCATGGCCCGTGCCCAAAGAGAATTCAAAAGGGCCCAACCCGAACTCCAAAACAGATCTAAGAGCCATGCTCTTGAAATAAGCATTTTCCACCTCTAGGGTAA.

Now I want to find the junction site in this sanger sequence, those ten bases which i have mentioned above. This is what i want to do.

ADD REPLYlink modified 8 months ago by Bastien Hervé4.4k • written 8 months ago by abdul.suboor1230

Please use the formatting bar (especially the code option) to present your post better. I've done it for you this time.
code_formatting

Thank you!

You have a multi fasta file of SnapGene sequences and a corresponding Sanger multi fasta file ?

Do you have some knowlegde in python or perl script language ? Even unix should do the trick I guess.

For each sequence you want an answer junction found/not found in sanger sequence ?

ADD REPLYlink modified 8 months ago • written 8 months ago by Bastien Hervé4.4k

Notepad++ is probably not the right tool for the job. Can you elaborate in which format your NGS and Sanger data are?

ADD REPLYlink written 8 months ago by WouterDeCoster40k

My NGS data is in fastQ format, whereas the sanger sequence data is in EditSeq file format. I have selected some junction reads through bedtools and then make primers for each read, and make sanger sequencing, this procedure I learn from my labmate, she perform it on circRNA data. my data is circDNA data. My objective is to validata my circDNA data, If there is any other way please tell me. I am doing the above process from last one month but yet not result.

ADD REPLYlink written 8 months ago by abdul.suboor1230
0
gravatar for abdul.suboor123
8 months ago by
Huazhong Agricultural University, China
abdul.suboor1230 wrote:

@Bastien Hervé , I have a little knowledge of Python, but I am not good at python. Yes I have SnapGene sequences and sanger sequeced files. Please tell me the trick how I can do that? Yeah for each sequence I need a junction in sanger sequence.

ADD COMMENTlink written 8 months ago by abdul.suboor1230

Hi abdul.suboor123,

This reply is better suited as a reply on my answer's comment. Could you make the appropriate change please? That would involve the following steps:

  1. Copy the contents of your reply from this answer (you can edit this answer (Ctrl/Cmd + click the link to open it in a new tab) and do a Select All -> Copy there).
  2. Click on Add Reply on my post here: C: How to perform sequence validation of NGS data?
  3. Paste the copied text
  4. Click on the green Add Comment button
  5. Click on moderate back in your answer here: A: How to perform sequence validation of NGS data?
  6. Choose Delete Post
  7. Click on the blue Submit button.

Thank you!

ADD REPLYlink modified 7 months ago • written 7 months ago by Bastien Hervé4.4k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1310 users visited in the last hour