Hi everyone! I'm trying to map sequencing data (Illumina, paired-end) with Novoalign but I'have some troubles to understand the results. First, I've constructed the index by using :
'/home//Desktop/novocraft/novoindex' 'celegans.nix' '/home//Desktop/novocraft/c_elegans.WS220.genomic.fa'
# novoindex (3.7) - Universal k-mer index constructor.
# (C) 2008 - 2011 NovoCraft Technologies Sdn Bhd
# novoindex celegans.nix /home//Desktop/novocraft/c_elegans.WS220.genomic.fa
# Creating 32 indexing threads.
# Building with 12-mer and step of 1 bp.
# novoindex construction dT = 2.8s
# Index memory size 0.483Gbyte.
# Done.
I don't know if it's normal or not, but when I try to have a look on this nix file, this is what I have:
head '/home//celegans.nix'
�`?�j)>�celegans.nixe�!
+�4d8�<,D\J�K:MeQaU[R^@oz}/�+�*�l�/�6�{�����8����ѣS�(�0�8�
���.�&�#����'�;���m�O�a��������q������6�������M�h���o�"�${%-&�'y)*�*?,a.�/�0k5S9A:;]<:=�=d>�>e@�@gA�AC�C8DiE�H�I�JL�L�L�M
N6O�O-P�P�QBR�R(UVX�Y�Z]*_�_�`Qa�c�dCe�f�hej�k�o�v�w�f��c���E���D�l�����g���ϒ���ϓ4�w�>��������U�ܛt�����t�����:����L��������P��������@�t�Y�,�ھ����N���u�s�s�X����r���������c�����u�����x�������Y�����������������
�
Anyway, I decided to continue the analysis and tried to map paired-end fragments:
/home//Desktop/novocraft/novoalign' -d '/home//Desktop/celegans.nix' -f '/home/Desktop/analysis/29t1/29t1_L2_R1_001_bRiyPfyeAal4.fastq' '/home//Desktop/analysis/29t1/29t1_L2_R2_001_AplPFFbRfW86.fastq' -o SAM > alignement_29.sam
head '/home/alignement_29.sam'
:
@HD VN:1.0 SO:unsorted
@PG ID:novoalign PN:novoalign VN:V3.07.00 CL:novoalign -d /home/Desktop/analysis/celegans.nix -f /home/Desktop/analysis/29t1/29t1_L2_R1_001_bRiyPfyeAal4.fastq /home/Desktop/analysis/29t1/29t1_L2_R2_001_AplPFFbRfW86.fastq -o SAM
@SQ SN:CHROMOSOME_I LN:15072423 AS:celegans
@SQ SN:CHROMOSOME_II LN:15279345 AS:celegans
@SQ SN:CHROMOSOME_III LN:13783700 AS:celegans
@SQ SN:CHROMOSOME_IV LN:17493793 AS:celegans
@SQ SN:CHROMOSOME_V LN:20924149 AS:celegans
@SQ SN:CHROMOSOME_X LN:17718866 AS:celegans
@SQ SN:CHROMOSOME_MtDNA LN:13794 AS:celegans
HWI-ST865:528:HCYNTBCXX:2:1101:1307:2105 99 CHROMOSOME_III 4080132 70 25M = 4080462 355 CCGCAATTTGTCTTCAACTCTTCGA <0BD0<CCFHIIHHFEGHIHIII1G PG:Z:novoalign AS:i:0 UQ:i:0 NM:i:0 MD:Z:25 PQ:i:1 SM:i:70 AM:i:70
HWI-ST865:528:HCYNTBCXX:2:2216:16878:99707 83 CHROMOSOME_II 1291022070 22M3S = 12910087 -155 GCAACTCAAAAGAAGTTCAGACACN 1<<<1<1<<D<<GD<<?GD1<<0<# PG:Z:novoalign AS:i:68 UQ:i:68 NM:i:2 MD:Z:4A1A15 PQ:i:99 SM:i:28 AM:i:28
HWI-ST865:528:HCYNTBCXX:2:2216:16878:99707 163 CHROMOSOME_II 1291008770 25M = 12910220 155 ATTCCAGACGCGAATGGATTGCTAT 00<<<D11<<</CEHF<C<<<<1<1 PG:Z:novoalign AS:i:30 UQ:i:30 NM:i:1 MD:Z:8A16 PQ:i:99 SM:i:61 AM:i:28
Does anyone know if these results are normal? I want to convert this SAM file on a BED file but it doesn't work and I don't know if maybe this is because this SAM file is wrong.
Thanks a lot for your help! Maude
Index files are binary so what you are seeing with the head command is logical (you can't view binary files that way). Your alignment sam file looks fine as well.
Did you mean to say you want to convert it to BAM? What command did you try/use?
Thank you for your answer. No I'm trying to convert it to BED. For this I'm using sam2bed from Bedops. But in fact, now I'm realizing that I've skipped some steps and that I can't go directly to this format... I will work a little bit more on that. Thank you again!