Closed:Extract unmapped reads from salmon output
0
0
Entering edit mode
5.3 years ago
vaushev ▴ 20

I run salmon with --writeUnmappedNames option; now I want to extract and analyze those unmapped reads - so I want to learn what's the best way to do it.
First of all, what tool to use for extraction. Now I start with something like seqtk subseq $fN1z $fnLst > $fnOut1 (where $fN1z is filename of fasta file and $fnLst is aux_info/unmapped_names.txt from salmon output). This works, but is a bit slow so I am wondering if there's any better tool then seqtk for this task.
Second, I want to make sure I understand correctly the output for pair-end reads. In the manual, I see different cases of reads being unmapped, like "u = entire pair", "m1 = left orphan", etc. On my understanding, I should extract separately from both fastq files: from *_1.fq I extract reads marked u and m1, and from *_2.fq I extract reads marked u and m2 (for now let's ignore m12 case). Is it correct?

P.S. lastly, slightly offtopic (purely unix) question: what would be the fastest way to sort those extracted reads by their abundance? I run the following: sed -n 'n;p;n;n' $fnOut1 | sort | uniq -c | sort -k1 -r -n > $fnSort - it does the job but takes quite some time...

RNA-Seq rna-seq salmon • 265 views
ADD COMMENT
This thread is not open. No new answers may be added
Traffic: 1593 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6