How can I count reads aligned to features and extract poly-T stretch information from the Read ID?
1
0
Entering edit mode
6.4 years ago
biplab ▴ 110

I have modified R1 reads ID by adding length of poly-T stretches in R2 reads. After that I have aligned R1 reads to genome. Now I have sam files. I can use HT-seq to count number of reads aligned to features. In addition to counts of reads aligned to feature, I would get average T length. Here is example of a reads from sam files.

J00113:322:0001:30:0:GT 16  IV  1450285 22  150M    *   0   0 ACCAGTAGTGTGTCTTCTCTTTGCCTTGGCAGCCCAGTTGTGAGATCTAGTCTTAGCGGATGGGTAACCACAAGAGGAACAGGTCTTCTTTTGAACATGGAAAGAACGACGACCAC
ATCTGTTACACAAGGTGTGAGATTTGATTCTCCG  7-<-FJFJJFJJFFJJ<JJFJAF7JFFAAFJFF<FAJAJFFA-7JJ7F<AAFJJ<JFFFJJFJJFJF7<7-F77<-7JF7A-AJFJFJJJJJF<FJFJJFA7FF<--7-<--F-F77J<JFJF-<<<FFJJJJFF<JFJJJJFJA-AA-A  AS:i:-23    X

30 in sequence ID field is the length of polyT stretch in R2 files which did not use for alignment. Thanks in advance for helping me with idea about extracting average T length and counts associated with features.

next-gen rna-seq • 1.3k views
ADD COMMENT
3
Entering edit mode
6.4 years ago

Have htseq-count (or featureCounts, which is much faster) output a BAM/SAM file with the alignments labeled as to which gene (if any) they were assigned (the -o option). You then parse that to get the auxiliary tag it appends. That holds the gene to which the read was assigned and you can appropriately increment a value in a hash after parsing the read ID.

ADD COMMENT
0
Entering edit mode

Thank you so much.

ADD REPLY

Login before adding your answer.

Traffic: 1661 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6