Introduction
Datasets
Tools
Publications
|
Datasets
In the link below you can find four datasets of Arabidopsis thaliana sequences. One of the datasets contains positive
examples, namely mRNA 3/ end sequences that contain a polyadenylation site, whereas the other three contain
negative examples (intronic, 5/ UTR, and coding sequences). These data have been used in previous studies
[1, 2]. All sequences in every dataset have a length of 400 nt. Each positive
sequence has an EST-supported polyadenylation site at position 301. The sequences of the positive dataset
have undergone pair-wise global alignment against every other sequence [2] in order to minimize biasness
due to similarity of sequences.
Arabidopsis Thaliana Dataset
References
1. G. Ji, J. Zheng, Y. Shen, X. Wu, R. Jiang, Y. Lin, J. Loke, K. Davis,
G. Reese, Q. Li: Predictive modeling of plant messenger RNA
polyadenylation sites. BMC Bioinformatics 2007, 8:43.
2. C.H. Koh, L. Wong: Recognition of polyadenylation sites from
Arabidopsis genomic sequenses. In Proceedings of 18th International
Conference on Genome Informatics, pages 73-82, Singapore, 2007.
|