Question: Bowtie Alignment with scaffold assembled reference genome
0
gravatar for mail2steff
5 weeks ago by
mail2steff30
Potsdam, Germay
mail2steff30 wrote:

Dear all,

I got the whole genome sequences of a papaya variety. And I downloaded the ref genome (Carica papaya) from NCBI (scaffold level assembly). When I check for the quality of my sequences, everything is fine except for the Sequence Length Distribution. When I run bowtie without trimming I got the following warning

warning skiping mate #1 read ..... because it was < 2 character long.

And the alignment results was

62674658 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
16140697 + 0 mapped (25.75% : N/A)
62674658 + 0 paired in sequencing
31337329 + 0 read1
31337329 + 0 read2
14389072 + 0 properly paired (22.96% : N/A)

When I run bowtie after trimming, I didnt get any warnings. The alignment percentage is same. But we expect more alignment percentage since my sample also belongs to the papaya variety.

Is there any problem with the reads? Carica papaya (reference genome) has only scaffold level assembly. not chromosomal level. Will that matter in the alignment? Ive pasted few lines from my reference fasta file.

>NW_019011177.1 Carica papaya cultivar SunUp chromosome LG1 unlocalized genomic scaffold, Papaya1.0, whole genome shotgun sequence
TAATAAGAACATAGAGTAATATAATTGTGTTGAAAATTTCAGTATGGAATGAAATGTTTGACAAGCTTTGATAGGGATGC..
gaaaaatattatcctATAAGTAATTATACAAATTGCCGCTTCATTTTAGTAATATATTTCTagttaatttacaaaattac
cATGGATTATGCtctcttaattttataatatatgctctcttttctcatattttgatattgtatttaatatatatgtataa
caagtccataattttattaaaaaaatcataatgaaTACTATAAGTAATTGGAGAGAAGTATGTGATAGTTGGAGAGAAGT
AGATTTGGACACGTTAGtagtagaaaaaataatttctaaacaaCAGTGCCATGTTAGCCGTTGAAATGGATGAGAAATGA
>NW_019011178.1 Carica papaya cultivar SunUp chromosome LG1 unlocalized genomic scaffold, Papaya1.0, whole genome shotgun sequence
CTAAATATCATGTTTGTCTATTTATATCTTTAACTTTGCAAATGTCTAAAGCACTCATGACAAATAGACTCTTAGAAGC
TGAAAGCGGCtttaaattaacattaatatCGAACTGCTTTGCACCTAAATCACACAACAAAACATCAAAATTTTGATGAT
TATTAAGCGGCAGATAGGCTTATCttgattgattatattttaGATACAAAAAAGCAGTTATTTGTGTCATAATTTTCATC

Is there any problem with my reference genome file? Or should i try any other alignment software? Could you please guide me in this? Base Statistics

`File type  Conventional base calls

`Encoding   Sanger / Illumina 1.9

Total Sequences 31337329

Sequences flagged as poor quality   0

Sequence length 0-151
%GC 36

Reference Genome Statistics

Assembly level:     Scaffold
Assembly:   GCA_000150535.1 Papaya1.0 scaffolds: 17,766 contigs: 47,485 N50: 10,650 L50: 7,081
BioProjects:    PRJNA264084, PRJNA20267
Whole Genome Shotgun (WGS):     INSDC: ABIM00000000.1
Statistics:  total length (Mb): 370.419
 protein count: 26103
 GC%: 39.0069
bowtie papaya alignment ngs • 144 views
ADD COMMENTlink modified 5 weeks ago • written 5 weeks ago by mail2steff30

It will be helpful if you can provide following information

  • Read length distribution graph - you can host the image and then paste the link here
  • Read statistics - no. of reads, read chemistry , platform, total bases etc.
  • Statistics of the downloaded genome - no. of fasta headers + total no. of bases
ADD REPLYlink written 5 weeks ago by Vijay Lakhujani1.9k

Ive updated my post. Kindly check it. I have 17766 fasta headers in my file and total number of bases in my reference genome is 370418818

ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by mail2steff30

Hi Vijay, I tried with bwa mem. In that I got 55% alignment. How can I interpret this?

ADD REPLYlink written 4 weeks ago by mail2steff30
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1148 users visited in the last hour