Wgsim Output Interpretation
1
0
Entering edit mode
11.2 years ago
darxsys ▴ 240

I'm doing a project on which I need output from wgsim (at least I was told that). I have downloaded wgsim from it's git hub page and ran it for some bacteria genome found on NCBI's pages. Now, the problem is, I don't understand what it's output is and how to interpret it. For example, for a complete genome of length <1500 bases, I got a 4 million line text file with stuff like this (read1 and read2 files look almost the same):

@gi|379009891|ref|NC016894.1|:101-145674212561:0:01:0:00/1 GTAGAATGATCGCGACCGCCAAATTCATCACCAATTTTAGGAAGTGATAAATCAGTAATCACACGCGTGA + 2222222222222222222222222222222222222222222222222222222222222222222222 @gi|379009891|ref|NC016894.1|:101-14564409582:0:00:0:01/1 ATAATCCACTTTTTATTTATGGTGTCGTCGGTTTAGGAAAAACGCATTTAATTCAAGCCATCGGACATTA + 2222222222222222222222222222222222222222222222222222222222222222222222 @gi|379009891|ref|NC016894.1|:101-1456705261:1:02:0:02/1 AGTTTTAACACCTGGAATTTAAAAATAAAACCGATAAATTACGTCAATAATACTTACTATTTTTTATCTG +

This is from the read1 file. I'm asking this because I couldn't Google anything out and it may be useful for other people as well.

• 2.5k views
ADD COMMENT
1
Entering edit mode
11.2 years ago

the output is a pair of FASTQ files.

ADD COMMENT
1
Entering edit mode

and the small snippets of sequences that you get in these FASTQ files were generated from the reference sequence and depending on the settings may contain mutations, structural variants and errors relative to the original

ADD REPLY

Login before adding your answer.

Traffic: 1346 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6