Polyadenylation Site Prediction






In the link below you can find four datasets of Arabidopsis thaliana sequences. One of the datasets contains positive examples, namely mRNA 3/ end sequences that contain a polyadenylation site, whereas the other three contain negative examples (intronic, 5/ UTR, and coding sequences). These data have been used in previous studies [1, 2]. All sequences in every dataset have a length of 400 nt. Each positive sequence has an EST-supported polyadenylation site at position 301. The sequences of the positive dataset have undergone pair-wise global alignment against every other sequence [2] in order to minimize biasness due to similarity of sequences.

Arabidopsis Thaliana Dataset


1. G. Ji, J. Zheng, Y. Shen, X. Wu, R. Jiang, Y. Lin, J. Loke, K. Davis, G. Reese, Q. Li: Predictive modeling of plant messenger RNA polyadenylation sites. BMC Bioinformatics 2007, 8:43.

2. C.H. Koh, L. Wong: Recognition of polyadenylation sites from Arabidopsis genomic sequenses. In Proceedings of 18th International Conference on Genome Informatics, pages 73-82, Singapore, 2007.


Return to MLKD