Protein Subcellular and Secreted Localization Prediction Using Deep Learning

Hamza Zidoum, Mennatollah Magdy

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Predicting the protein structure and discovering its function according to its location in the cell is crucial for understanding the cellular translocation process and has direct applications in drug discovery. Computational prediction of protein localization is alternative to the time consuming experimental counterpart approach. We use deep learning approach to enhance the prediction accuracy while reducing the time in predicting uncharacterized protein sequence localization site. Our approach is based on general biological features of the protein sequence, and compartment specific features to which we added the physico-chemical sequence features. We collected the protein sequences from UniProt1/SWISS-PROT, then we collected the features for each protein. We consider five locations in the dataset, namely cytoplasm (CP), inner membrane (IM), outer membrane (OM), periplasm (PE) and secreted (SEC). We choose the protein sequences to be at least 100 amino-Acid-length and a maximum length of 1000 amino acids. Each location contains 500 protein sequences. We propose a deep learning prediction method for bacteria taxonomy that combines a one-versus-one and one-versus all models along with feature selec-Tion using linear SVM ranking, and deep auto-encoders to initialize the weights. The method achieves overall accuracy of 97.81% using 10-fold cross-validation on our data. Our approach outperforms the current state of the art computational methods in protein subcellular localization on the selected dataset.

Original languageEnglish
Title of host publication2018 International Conference on Computing Sciences and Engineering, ICCSE 2018 - Proceedings
EditorsHazem Raafat, Mostafa Abd-El-Barr, Muhammad Sarfraz, Paul Manuel
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1-6
Number of pages6
ISBN (Electronic)9781538646809
DOIs
Publication statusPublished - Jun 5 2018
Event2nd International Conference on Computing Sciences and Engineering, ICCSE 2018 - Kuwait City, Kuwait
Duration: Mar 11 2018Mar 13 2018

Other

Other2nd International Conference on Computing Sciences and Engineering, ICCSE 2018
CountryKuwait
CityKuwait City
Period3/11/183/13/18

Fingerprint

Protein Sequence
Proteins
Protein
Prediction
Amino Acids
Membrane
Drug Discovery
Translocation
Protein Structure
Encoder
Taxonomy
Cross-validation
Amino acids
Computational Methods
Bacteria
Ranking
Fold
Choose
Membranes
Learning

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Computer Science Applications
  • Computer Vision and Pattern Recognition
  • Signal Processing
  • Modelling and Simulation

Cite this

Zidoum, H., & Magdy, M. (2018). Protein Subcellular and Secreted Localization Prediction Using Deep Learning. In H. Raafat, M. Abd-El-Barr, M. Sarfraz, & P. Manuel (Eds.), 2018 International Conference on Computing Sciences and Engineering, ICCSE 2018 - Proceedings (pp. 1-6). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICCSE1.2018.8374220

Protein Subcellular and Secreted Localization Prediction Using Deep Learning. / Zidoum, Hamza; Magdy, Mennatollah.

2018 International Conference on Computing Sciences and Engineering, ICCSE 2018 - Proceedings. ed. / Hazem Raafat; Mostafa Abd-El-Barr; Muhammad Sarfraz; Paul Manuel. Institute of Electrical and Electronics Engineers Inc., 2018. p. 1-6.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Zidoum, H & Magdy, M 2018, Protein Subcellular and Secreted Localization Prediction Using Deep Learning. in H Raafat, M Abd-El-Barr, M Sarfraz & P Manuel (eds), 2018 International Conference on Computing Sciences and Engineering, ICCSE 2018 - Proceedings. Institute of Electrical and Electronics Engineers Inc., pp. 1-6, 2nd International Conference on Computing Sciences and Engineering, ICCSE 2018, Kuwait City, Kuwait, 3/11/18. https://doi.org/10.1109/ICCSE1.2018.8374220
Zidoum H, Magdy M. Protein Subcellular and Secreted Localization Prediction Using Deep Learning. In Raafat H, Abd-El-Barr M, Sarfraz M, Manuel P, editors, 2018 International Conference on Computing Sciences and Engineering, ICCSE 2018 - Proceedings. Institute of Electrical and Electronics Engineers Inc. 2018. p. 1-6 https://doi.org/10.1109/ICCSE1.2018.8374220
Zidoum, Hamza ; Magdy, Mennatollah. / Protein Subcellular and Secreted Localization Prediction Using Deep Learning. 2018 International Conference on Computing Sciences and Engineering, ICCSE 2018 - Proceedings. editor / Hazem Raafat ; Mostafa Abd-El-Barr ; Muhammad Sarfraz ; Paul Manuel. Institute of Electrical and Electronics Engineers Inc., 2018. pp. 1-6
@inproceedings{2cfafac74a754c58ad48ace529d1cfc1,
title = "Protein Subcellular and Secreted Localization Prediction Using Deep Learning",
abstract = "Predicting the protein structure and discovering its function according to its location in the cell is crucial for understanding the cellular translocation process and has direct applications in drug discovery. Computational prediction of protein localization is alternative to the time consuming experimental counterpart approach. We use deep learning approach to enhance the prediction accuracy while reducing the time in predicting uncharacterized protein sequence localization site. Our approach is based on general biological features of the protein sequence, and compartment specific features to which we added the physico-chemical sequence features. We collected the protein sequences from UniProt1/SWISS-PROT, then we collected the features for each protein. We consider five locations in the dataset, namely cytoplasm (CP), inner membrane (IM), outer membrane (OM), periplasm (PE) and secreted (SEC). We choose the protein sequences to be at least 100 amino-Acid-length and a maximum length of 1000 amino acids. Each location contains 500 protein sequences. We propose a deep learning prediction method for bacteria taxonomy that combines a one-versus-one and one-versus all models along with feature selec-Tion using linear SVM ranking, and deep auto-encoders to initialize the weights. The method achieves overall accuracy of 97.81{\%} using 10-fold cross-validation on our data. Our approach outperforms the current state of the art computational methods in protein subcellular localization on the selected dataset.",
author = "Hamza Zidoum and Mennatollah Magdy",
year = "2018",
month = "6",
day = "5",
doi = "10.1109/ICCSE1.2018.8374220",
language = "English",
pages = "1--6",
editor = "Hazem Raafat and Mostafa Abd-El-Barr and Muhammad Sarfraz and Paul Manuel",
booktitle = "2018 International Conference on Computing Sciences and Engineering, ICCSE 2018 - Proceedings",
publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - GEN

T1 - Protein Subcellular and Secreted Localization Prediction Using Deep Learning

AU - Zidoum, Hamza

AU - Magdy, Mennatollah

PY - 2018/6/5

Y1 - 2018/6/5

N2 - Predicting the protein structure and discovering its function according to its location in the cell is crucial for understanding the cellular translocation process and has direct applications in drug discovery. Computational prediction of protein localization is alternative to the time consuming experimental counterpart approach. We use deep learning approach to enhance the prediction accuracy while reducing the time in predicting uncharacterized protein sequence localization site. Our approach is based on general biological features of the protein sequence, and compartment specific features to which we added the physico-chemical sequence features. We collected the protein sequences from UniProt1/SWISS-PROT, then we collected the features for each protein. We consider five locations in the dataset, namely cytoplasm (CP), inner membrane (IM), outer membrane (OM), periplasm (PE) and secreted (SEC). We choose the protein sequences to be at least 100 amino-Acid-length and a maximum length of 1000 amino acids. Each location contains 500 protein sequences. We propose a deep learning prediction method for bacteria taxonomy that combines a one-versus-one and one-versus all models along with feature selec-Tion using linear SVM ranking, and deep auto-encoders to initialize the weights. The method achieves overall accuracy of 97.81% using 10-fold cross-validation on our data. Our approach outperforms the current state of the art computational methods in protein subcellular localization on the selected dataset.

AB - Predicting the protein structure and discovering its function according to its location in the cell is crucial for understanding the cellular translocation process and has direct applications in drug discovery. Computational prediction of protein localization is alternative to the time consuming experimental counterpart approach. We use deep learning approach to enhance the prediction accuracy while reducing the time in predicting uncharacterized protein sequence localization site. Our approach is based on general biological features of the protein sequence, and compartment specific features to which we added the physico-chemical sequence features. We collected the protein sequences from UniProt1/SWISS-PROT, then we collected the features for each protein. We consider five locations in the dataset, namely cytoplasm (CP), inner membrane (IM), outer membrane (OM), periplasm (PE) and secreted (SEC). We choose the protein sequences to be at least 100 amino-Acid-length and a maximum length of 1000 amino acids. Each location contains 500 protein sequences. We propose a deep learning prediction method for bacteria taxonomy that combines a one-versus-one and one-versus all models along with feature selec-Tion using linear SVM ranking, and deep auto-encoders to initialize the weights. The method achieves overall accuracy of 97.81% using 10-fold cross-validation on our data. Our approach outperforms the current state of the art computational methods in protein subcellular localization on the selected dataset.

UR - http://www.scopus.com/inward/record.url?scp=85049371576&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85049371576&partnerID=8YFLogxK

U2 - 10.1109/ICCSE1.2018.8374220

DO - 10.1109/ICCSE1.2018.8374220

M3 - Conference contribution

AN - SCOPUS:85049371576

SP - 1

EP - 6

BT - 2018 International Conference on Computing Sciences and Engineering, ICCSE 2018 - Proceedings

A2 - Raafat, Hazem

A2 - Abd-El-Barr, Mostafa

A2 - Sarfraz, Muhammad

A2 - Manuel, Paul

PB - Institute of Electrical and Electronics Engineers Inc.

ER -