Polyadenylation Site Prediction






In the link below you can find four datasets of Arabidopsis thaliana sequences. One of the datasets contains positive examples, namely mRNA 3/ end sequences that contain a polyadenylation site, whereas the other three contain negative examples (intronic, 5/ UTR, and coding sequences). These data have been used in previous studies [1, 2]. All sequences in every dataset have a length of 400 nt. Each positive sequence has an EST-supported polyadenylation site at position 301. The sequences of the positive dataset have undergone pair-wise global alignment against every other sequence [2] in order to minimize biasness due to similarity of sequences.

Arabidopsis Thaliana Dataset


