The endoplasmic reticulum plays an important role in protein folding and processing of newly synthesized proteins. It also provides the pace for the degradation of the mis-folded or incorrectly folded proteins. Besides it acts as multifunctional organelle, which control a wide range of signaling mechanism important for cellular processes. So in context of endoplasmic reticulum, it is one of the most important sub-cellular compartments and hence prediction of endoplasmic reticular proteins is one of the major challenges in the field of bioinformatics. This study describes a novel method, ERPred, developed for the prediction of endoplasmic reticular proteins with very high accuracy. First we use amino acid composition, pseudo-amino acid composition and dipeptide composition as SVM input and found maximum accuracy 73.34%, 74.85% and 72.28% respectively. The accuracy of prediction was further increase when we use split amino acid composition (C-terminal and remaining residues, N-terminal and remaining residues and C-terminal, N-terminal and remaining residues) and found 71.07%, 81.19% and 81.42% respectively. When we use C-terminal, N-terminal and remaining residues as SVM input, we found better accuracy than other prediction features. So in this study, we developed split amino acid composition (C-terminal, N-terminal and remaining) and support vector machine based novel method to predict endoplasmic reticular proteins and we found maximum accuracy 81.42% with 0.42 MCC value.

Please cite: Kumar R, Kumari B, Kumar M. Prediction of endoplasmic reticulum resident proteins using fragmented amino acid composition and support vector machine. PeerJ. 2017;5:e3561. doi:10.7717/peerj.3561