Chromosome "whole genome shotgun sequence" not found
3
0
Entering edit mode
13 months ago
Hayler Edu ▴ 40

Hello everyone, I hope that you're okay

Today I'm trying to do an analysis with population 2 but before I have to bind my .bam files using mpileup. The problem is that when I try to bind the .bam files using mpileul the output shows a problem with the chromosomes.

This Is the problem:

[E::faidx_fetch_seq] The sequence "NC_039898.1 Coffea arabica cultivar Caturra red isolate CCC135-36 chromosome 1c, Cara_1.0, whole genome shotgun sequence" not found

Someone can help me? I believe that is a problem with the genome reference but I'm not sure of that.

The script that I' m using is this:

samtools mpileup -B -f GCF_003713225.1_Cara_1.0_genomic.fna -Q20 -q20 MA_9BCOL_S8_L001_R1_001.paired_sorted.bam MA_9BCOL_S8_L001_R2_001.paired_sorted.bam>std.error.txt > pools_all3.mpileup
samtools • 951 views
ADD COMMENT
1
Entering edit mode
13 months ago

The chromosomes sequences defined in the bam file are not the same that in the ".fna" file.

what is the output of

samtools view -H MA_9BCOL_S8_L001_R1_001.paired_sorted.bam | grep GCF_003713225 -m 10
grep GCF_003713225 -m 10 GCF_003713225.1_Cara_1.0_genomic.fna
ADD COMMENT
0
Entering edit mode

Hello Pierre when I use grep in the output, I don't find anything in the output

ADD REPLY
1
Entering edit mode
13 months ago

should be

 d_sorted.bam 2> std.error.txt > pools_all3.mpileup

not:

d_sorted.bam>std.error.txt > pools_all3.mpileup
ADD COMMENT
0
Entering edit mode

Hello this one was the solution thank you

ADD REPLY
1
Entering edit mode
13 months ago
LChart 3.9k

It looks to me like the .bam file has full contig names (contig ID plus description) in the header rather than just contig IDs. Samtools automatically strips the description from descriptions, e.g.

>ctig1 description of contig
TGTGCGTAGTAAGAGCATCACCAGAGTCATTTCAGCGTGCCACTTCCGGTGGATACACA
GTGCACTCAAGGGCTGGCAACATCCCGAGCTGGATAATTGATTTTGTAAGCGTTGACGT
>ctig2 description2 of contig
AGATTGTGAAAGCATCTGAACCAACCCGGTCGATCCCCTCTGGTGGTGTATTCCGCAAT
GACAGACGAATACTGCCCAGGGGAGGCTTCACTCGATCCCGGCTATGGGCGGCGTTTAC

has the index

ctig1   491     29      59      60
ctig2   491     559     59      60

If you're certain that the reads in your .bam were aligned to the same sequences in the references you want to use; you could use

samtools view -H | sed 's/SN:\([^ ]\+\).*\tLN/SN:\1\tLN/g' > newheader.txt

to generate a new bam header with the descriptions removed; and then samtools reheader to replace the old header with the new one.

ADD COMMENT
0
Entering edit mode

Tomorrow I will review your recommendations and then I will tell you how it went

ADD REPLY

Login before adding your answer.

Traffic: 2133 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6