Question: How to map BAM file to reference genome "[fai_fetch_seq] The sequence not found"
0
gravatar for beausoleilmo
12 months ago by
beausoleilmo150
McGill University
beausoleilmo150 wrote:

I want to use samtools to create a .bcf file using this command:

samtools mpileup -C 50 -E -t SP -t DP -u -I -f /my/path/Geospiza_fortis.scaf.noBacterial.fa -b bam_list.txt > output_test.bcf

But I get this error:

[fai_fetch_seq] The sequence "JH739887" not found

I know that the problem is that the name "JH739887" doesn't exist in the reference genome. But How I'm I supposed to change this name so that it can be mapped on the reference genome?

head Geospiza_fortis.scaf.noBacterial.fa
>Scaffold410    275
TGCATTAATATGAGTGTGTGCTGCAAAAGTTCAGGTCATGGTCCGATCATACTTCACATTTTGGTAGCACTTTAAGCAGAGATCGGTTATCCCATTCTGTGGAAGACTCAACACTATCATAAGGTCCCACAGTTTTATTATCCCTCTGCCTCCCGGAATGCCCCCGGCAGTGAGGGGTACCATCTTCTCAGCAGTAAGGATATTCTTCAGGAGTTCCGTGTGAGCTTTCCCGGATTTAGTTCCATTTTTTAAATACTTCCCAATTCTTTGCTTTG
>Scaffold430    374
CTTTGTTAACTGAAAGAGCCTCTAAGTAGATGACCAGTGCTCAGTTAGTACAGTATGAATTTTGTTTAATGGAACAGGAAGATTTAGTATTGAGAAGCGGTTAAGGGTTTAACCCAGCCTCCTGTCTGAATGGACCTGAAGAGGGGGGCCGGGAAGAAACCCATGACTGCATTAAAGTGATAGATCTCCAGACATGGGCTAGGGAAGATTTACAAGACACTCCCTGGCCTGAGGGAGAAAATATGTTTATTGATGAGTCTTCAAGGGTGGCAGAAGGGAAGCGATTTACAGGATACACAATCATTAATGGAAGGAAATTAAAGGAAGGGGGGAGATTGTCACCCACCTGGTCAGTTCAGACAGCAGAGCTGTAT

head Geospiza_fortis.scaf.noBacterial.fa.fai 
# Scaffold410       275     17      275     276
# Scaffold430       374     310     374     375
# Scaffold1010      597     703     597     598
# scaffold260       820     1314    820     821
# scaffold980       554     2148    554     555
# scaffold440       1463    2716    1463    1464

samtools view Bamboum_sorted.bam | head 
24016062    16  JH739887    2197    37  94M *   0   0   ACCATTTAAAACACTCTTCAGGGAAGGCTTTAGCTTTTCATTCAGTTTTGTATTTTACTACATGAATATAAAGCAGTTTTGCTTCTGTGTTTTT  >@BB@>>>;;?>;@A>AAABBBBBBBBBBBBBBBBBBA?A?9<===<<BAABBABA=BBABBBBBBBBCBB>CCCACCC73?@2+<2+0+A<+=  NH:i:1  NM:i:3  RG:Z:JP3750_for_sorted
27522854    0   JH739887    2921    37  94M *   0   0   AATTTCCAACACTCTGCAAGAAGCTGCAAGAGTTCGCCTGCACCTGTGTCTTATTGCCACAAGAGCTCTCTGTTCTATACTCTGAAATTCTCAC  BCFFFFFHHHHHJJJJJJJJIJIJIJJJJJJJFGHIIJJJIJJJIIIIIHHIJJJJJJJJJJHIJJJJHHHHHHFFFFFFFEEEEEEEDDEDED  NH:i:1  NM:i:0  RG:Z:JP3923_for_sorted
31888152    0   JH739887    4964    37  94M *   0   0   AAAGCCCACCTGCATTATACAACATGAAGCCGTAGAATAGACAGTCTGATGGTTTACAATGTTTAGGAGATGTTACCATGTTAAGGATCATTAT  C@FFDFFHHHHHJIJJJJJJJJJJJIJJJJJJGIJJJJJ*?GHIEHIJIHIIHIIJJIJJJHII===CHEHGHHFFFFFFEEEDEEEDDDDEEE  NH:i:1  NM:i:5  RG:Z:JP3752_for_sorted
reference genome bam fasta • 587 views
ADD COMMENTlink modified 10 months ago by Biostar ♦♦ 20 • written 12 months ago by beausoleilmo150
1

Had you aligned against the fasta file that you're giving to samtools you wouldn't have these issues.

ADD REPLYlink written 12 months ago by Devon Ryan73k

@beausoleilmo: Heed the advice of someone who knows best :)

ADD REPLYlink written 12 months ago by genomax37k

Someone sent me this file containing the different possible names. You can use a script to change the name of the files in either of the files (FASTA __or__ BAM)

#Ass.       Genome C.   RefSeq.v        GenBank.v   NCBI name
GeoFor_1.0  scaffold40  NW_005054297.1  JH739887.1  GPS_002009865.1
GeoFor_1.0  scaffold112 NW_005054298.1  JH739888.1  GPS_002009866.1
GeoFor_1.0  scaffold41  NW_005054299.1  JH739889.1  GPS_002009867.1
GeoFor_1.0  scaffold130 NW_005054300.1  JH739890.1  GPS_002009868.1
GeoFor_1.0  scaffold54  NW_005054301.1  JH739891.1  GPS_002009869.1
GeoFor_1.0  scaffold16  NW_005054302.1  JH739892.1  GPS_002009870.1
GeoFor_1.0  scaffold23  NW_005054303.1  JH739893.1  GPS_002009871.1
GeoFor_1.0  scaffold11  NW_005054304.1  JH739894.1  GPS_002009872.1
GeoFor_1.0  scaffold53  NW_005054305.1  JH739895.1  GPS_002009873.1
GeoFor_1.0  scaffold98  NW_005054306.1  JH739896.1  GPS_002009874.1
GeoFor_1.0  scaffold138 NW_005054307.1  JH739897.1  GPS_002009875.1
GeoFor_1.0  scaffold55  NW_005054308.1  JH739898.1  GPS_002009876.1
GeoFor_1.0  scaffold172 NW_005054309.1  JH739899.1  GPS_002009877.1
GeoFor_1.0  scaffold166 NW_005054310.1  JH739900.1  GPS_002009878.1
GeoFor_1.0  scaffold206 NW_005054311.1  JH739901.1  GPS_002009879.1
GeoFor_1.0  scaffold66  NW_005054312.1  JH739902.1  GPS_002009880.1
GeoFor_1.0  scaffold171 NW_005054313.1  JH739903.1  GPS_002009881.1
GeoFor_1.0  scaffold157 NW_005054314.1  JH739904.1  GPS_002009882.1
...   ...   ...   ...   ...   ...   ...  ...  ... ...  ...  ...  ...  ...
ADD REPLYlink written 12 months ago by beausoleilmo150

ask your collaborator the REFerence fasta file he used with BWA. period.

ADD REPLYlink written 11 months ago by Pierre Lindenbaum100k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1468 users visited in the last hour