How to present "N" in gap region of consensus in sequence assembly?
2
2
Entering edit mode
9.0 years ago
kuosheng0629 ▴ 20

Hi:

My project is assembly NGS reads to some long contings for downstream analysis.

I use samtools and bcftools to generate consensus sequence with reads bam file, reference genome and VCF file:

samtools mpileup -vf reference.fasta reads.bam | bcftools call -m -O z - > reads.vcf.gz
bcftools index reads.vcf.gz
bcftools consensus -f reference.fasta reads.vcf.gz -o consensus.fast

However, I found out the gap regions were filled with reference sequence and I want to replace these gap sequences with "N",

For example:

Reference:    AGTCTGG   TTAA   GGCTGC
Reads bam:    ...T..A          ...A..
Consensus:    AGTTTGA   TTAA   GGCAGC

I want to replace "TTAA" as "NNNN" and let consensus sequence output as "AGTTTGA NNNN GGCAGC".

How can I do this?

Thanks

David Wang,
Nation Taiwan Ocean University

alignment next-gen Assembly • 4.7k views
ADD COMMENT
0
Entering edit mode

awk, sed, tr and other tools can help, initiate some code and see how far can you reach.
Eg. replacing the whitespace characters with N, with spaces in Reads.bam as a reference.

ADD REPLY
3
Entering edit mode
8.9 years ago
Brice Sarver ★ 3.8k

1. Call variants.

2. Identify positions with nocalls (say, by emitting all sites) and convert to a BED:

grep "\./\." [YOUR_VCF] | awk '{OFS="\t"; if ($0 !~ /\#/); print $1, $2-1, $2}' > [YOUR_FILTERED_BED]

3. Use BEDtools' maskfasta.

ADD COMMENT
0
Entering edit mode

Hello, I have the same problem, so thank you for your answer. My VCF file does not contain any gap sequences. I'm using GATK 4. 0.1.2 and I can't find how to emit all the positions. Sorry if this is a basic question, I'm new. Thanks

ADD REPLY
1
Entering edit mode
7.8 years ago
  1. get your consensus using mpileup + bcftools
  2. Use bedtools genomecov -bga for identifying the sites with no mapped reads ie. your N sites
  3. recover them via | grep -w 0$
  4. use bedtools maskfasta to change your target fasta
ADD COMMENT

Login before adding your answer.

Traffic: 3918 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6