Prediction of Proteins Structural Class in Two States Using the Hybrid Neural-Logistic Model

Abstract

The objective of the proposed study is predicting structural classes of proteins in two states (all-? and all-?). We used a two-stage hybrid model constructed of artificial neural networks (ANN) and logistic regression model (LRM). The LRM was initially used to extract the effective variables (n=7) from the generated structural variables (n=662) in order to simplify the structure of the ANN which intended to predict the structural classes of proteins. The predicting structural classes of proteins performed on one non-homologous mono-domain globular proteins data set (n=104). Among the 20 evaluated single amino acid composition frequencies Valine and Glycine frequency were statistically significant (P<.05) according to the result of LRM. Similarly among the 400 evaluated dipeptides composition frequencies the Lysine-Proline, Glutamine-Proline, Isoleucine-Serine and Serine-Glutamine frequencies were also significant. Among the 22 evaluated tripeptides frequencies only Asparagine-Leucine-Aspartic acid composition frequency was significant. The prediction of the proteins structural classes in two states (all-? and all-?) performed 88% only based on seven significant structural variables among 642 structural variables. In this study, both threshold dependent and independent (ROC) measures have been used for performance evaluation of the established hybrid model.

Keywords