Removing uncovered transcripts from multi FASTA reference file
0
0
Entering edit mode
27 days ago
ella • 0

Hi everyone :)

for RNASeq analyses, I have a reference file, containing multiple transcript sequences (it´s a subset of the NCBI human hg38 transcriptome). I found, that some of the transcripts are not even covered by a single read (especially if there are several transcript variants) and would like to exclude them from the file. Is there a way how I could filter those sub-FASTAS of my FASTA, that are covered by less than X reads?

I tried to search for an answer, but didn´t find any helpful posting, yet.

Thanks a lot in advance and have a nice day :)

RNASeq NGS Mapping reference FASTA • 216 views
ADD COMMENT
1
Entering edit mode

that some of the transcripts are not even covered by a single read

What did you use to align the data and how did you handle multi-mapping reads?

You should consider using a program like salmon (LINK) that will distribute reads across a set of transcripts. It is fast and is the appropriate way of handling data that came from a set of alternately spliced transcripts.

ADD REPLY
1
Entering edit mode

Seconding this. It will produce a table with the transcript name and the counts for that transcript. From there it is just parsing out the transcripts that have a zero and then remove them from the original fasta file, either with grep, samtools faidx or similar approaches. Does that make sense to you? Or you load the file into R and use the Biostrings package for the subsetting.

ADD REPLY
0
Entering edit mode

Thank you as well ATpoint, sounds like this was exactly what I was looking for :) I will try that!

ADD REPLY
0
Entering edit mode

Thanks for your fast answer.

I´m using BBMap for mapping. As I did not use the "ambig=" option, it should by default map the multi-mapping reads to the first encountered best site.

Great, thanks for the suggestion! I will have a closer look to salmon :)

ADD REPLY

Login before adding your answer.

Traffic: 1700 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6