samtools idxstats unplaced contigs
1
0
Entering edit mode
14 months ago
LacquerHed ▴ 20

Trying to figure out what the last line is of samtools idxstats output:

Here are the last few lines,

GL456368.1  20208   266 0
JH584292.1  14945   8   0
JH584295.1  1976    31  0
*   0   0   33800462


Is this an additional unmapped region? Attempted to get the bam file however was unable, not sure of proper notation for samtools view

samtools view -b /Users/possorted_genome.bam *  >  asterisk.bam


Basic point is looking for a randomly integrated human transgene in a mouse snRNA-seq assembly.

Also can a transgene integrate potentially within an unplaced contig like the ones above, was able to find mouse version of gene by blasting a db of chromosome 13 from mapped reads - but can't find the human even though I know its there.

Thanks.

contig STAR samtools • 669 views
0
Entering edit mode

If you are looking for a genomic insertion site then it may be better to follow the protocol described in this answer: Identification of the sequence insertion site in the genome

1
Entering edit mode
14 months ago
ATpoint 62k

The * means all unmapped reads. If you want to align against the transgene why not adding its sequence as an extra "chromosome" to the reference genome?

0
Entering edit mode

Is it possible to just convert all unmapped reads to a bam file, covert to fasta and just blast that for the transgene? When trying with samtools view I couldn't figure out the right notation. Im guessing it should be there, and seems less involved than augmenting the assembly. Thanks!

0
Entering edit mode

to get unmapped reads from a bam file you can use:

samtools view -f 4 file.bam


you can then blast the sequences

A more proper way would be to add your transgene to your reference as suggested by ATpoint, as there could be reads mapping to both (possibly with some mismatch)