Non-ATGC, small-case, 'N' characters in Fastq file
1
0
Entering edit mode
5.5 years ago

Hi All,

I am currently performing genome assembly. I have generated the consensus fastq file using the commands below. But the fastq file consists of lot non-ATGC characters (highlighted with bold). What are these characters and how to handle these? 

Commands used to generate Fastq file:
>>bwa index ref.fa
>>bwa aln -t 9 ref.fa D2_R2.fastq -f D2_R2.sai && bwa aln -t 9 cocsa_ref.fa D2_R1.fastq -f D2_R1.sai
>>bwa sampe ref.fa D2_R1.sai D2_R2.sai D2_R1.fq D2_R2.fq > D2-aln-pe2.sam
>>samtools faidx ref.fa
>>samtools view -bt ref.fa.fai D2-aln-pe2.sam > D2-aln-pe2.bam
>>samtools sort D2-aln-pe2.bam D2-aln-pe2.bam.srt
>>samtools index D2-aln-pe2.bam.srt.bam
>>samtools mpileup -uf ref.fa D2-aln-pe2.bam.srt.bam | bcftools view -cg - | vcfutils.pl vcf2fq > CONSENSUS.fq

CONSENSUS.fq file looks like:
@scaffold_1
nnngtttggtggtagtattggtatttcaaacacgctaggtgtttgttggttttgagtagg
tgtagctggagtagactctatctccatttctctatcagtttgggcctctggccctaggct
ctcctgtctgttttcttgagtatttactacaatagtatcactgtctggcggcattttatt
actaagctcttttcttagtaagcaactagatggtctgtgtgtttttgttttcgtgagtga
gacgtgttcagattagctactttaccagcttctagctctatagcgcgtgggctgcacgag
ttggcactagttgtaatcgatttcttgggatggatttgtatataattcgctaaaattaca
cctattctgaaaaactcgnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnTAATGTTACAAGTAAYAAGAAGGATYCTYTCCTTRACAAATRACGAGATGGC

P.S: Please also convey, how to handle the small-case characters and 'N's ? Should we mask/remove them to get a better set of scaffolds?

Thanks in advance.

Genome-assembly Non-ATGC small-case Fastq • 2.6k views
ADD COMMENT
2
Entering edit mode
5.5 years ago

Lower case indicates masked sequences already (often due to low confidence); many tools will ignore them.  I don't see any reason to remove them.

The non-ACGTN characters are IUPAC symbols typically indicating polymorphisms.  I normally convert them to N before further processing.

ADD COMMENT
0
Entering edit mode

Thanks a lot Brian.

ADD REPLY

Login before adding your answer.

Traffic: 2729 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6