I'm constructing an artificial "genome" to do alignments against, and there are several segments of it that I'd like to keep visually distinct, just for my own reference later (e.g. I use "BC" for "barcode" instead of the usual [chr]omosome). My "genome" looks like this:
>BC1 NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN GGGACCGGT CGAGGTGGTTGAAGGTCCTATAATGTCGCCCTCTCCTTCAT CAGACCAGTAGACCGATTAGGATAGAAAGGCTTAAAACTTA GGAGTGTGGTTTGTAATTAGGATAGAAAGGCTTAAAACTTA CGAGGTGGTTGAAGGTCCTATAATGTCGCCCTCTCCTTCAT CAGACCAGTAGACCGATTAGGATAGAAAGGCTTAAAACTTA GGAGTGTGGTTTGTAATTAGGATAGAAAGGCTTAAAACTTA GGTATAGTTATC CCTAG AAAAAAAAAAAAAAAAAAA NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN >BC2 ...(similar to above)
Each "chromosome" that I'm defining (e.g.
BC2, etc...) has a few segments corresponding to restriction sequences, poly-A tail, etc.. There is no actual biological segmentation within each
>BC block, but just for my own ability to quickly come back and visually distinguish each part later, I'm separating them by a newline. Can this create any potential problems? Are there any indexing or genome-conversion packages that assume fixed line lengths? I'm just wondering if this is bad practice.
*Edit: * I should add that I'm planning on using minimap2 and samtools for indexing and alignment.