State of the Art Artificial Neural Network, Deep Learning, and the Future Generation
DOI:
https://doi.org/10.17010/ijcs/2017/v2/i6/120440Keywords:
Artificial Neural Network
, Deep Learning, machine Learning, Neural Network.Manuscript received September 29
, 2017, revised October 14, accepted October 15, 2017. Date of publication November 6, 2017.Abstract
The use of the neural networks or artificial intelligence or deep learning in its broadest and most controversial sense has been a tumultuous journey involving three distinct hype cycles and a history dating back to the 1960s. Resurgent and enthusiastic interest in machine learning and its applications bolster the case for machine learning as a fundamental computational kernel. Furthermore, researchers have demonstrated that machine learning can be utilized as an auxiliary component of applications to enhance or enable new types of computation such as approximate computing or automatic parallelization. In our view, machine learning becomes not the underlying application, but a ubiquitous component of applications. In recent years, deep learning in artificial neural networks (ANN) has won numerous contests in pattern recognition and machine learning; this view necessitates a different approach towards the deployment of ANN and deep learning.
Downloads
Downloads
Published
How to Cite
Issue
Section
References
L. Deng, and D. Yu, “Deep learning: Methods and applications,†Foundations and Trends® in Signal Processing, vol. 7, no. 3–4, pp. 197-387, 2014. doi: http://dx.doi.org/10.1561/2000000039
Y. Bengio, “Learning deep architectures for AI,†Foundations and trends® in Machine Learning, vol. 2, no. 1, pp. 1-127, 2009.
Y. Wang, M. Liu, and Z. Bao, "Deep learning neural network for power system fault diagnosis," in the Control Conference (CCC), 2016 35th Chinese, Chengdu, 2016, pp. 6678-6683. doi: 10.1109/ChiCC.2016.7554408
E. Milova, S. Sveshnikova, and I. Gankevich, "Speedup of deep neural network learning on the MIC-architecture," in High Performance Computing & Simulation (HPCS), 2016 International Conference, pp. 989-992. doi: 10.1109/HPCSim.2016.7568443
Y. Guo, D. Tao, J. Yu, H. Xiong, Y. Li, and D. Tao, "Deep Neural Networks with Relativity Learning for facial expression recognition," in Multimedia & Expo Workshops (ICMEW), 2016 IEEE International Conference, pp. 1-6. doi: 10.1109/ICMEW.2016.7574736
Y. Bodyanskiy, O. Vynokurova, I. Pliss, G. Setlak, and P. Mulesa, "Fast learning algorithm for deep evolving GMDH-SVM neural network in data stream mining tasks," in Data Stream Mining & Processing (DSMP), IEEE First International Conference, pp. 257-262. doi: 10.1109/DSMP.2016.7583555
R. Bellman, Dynamic Programming. Princeton University Press, Princeton, 1957.
R. L. Stratonovich, “Conditional markov processes,†Theory of Probability & its Applications, vol. 5, no. 2, pp. 156-178, 1960. doi: https://doi.org/10.1137/1105015
A. P. Dempster, N. M. Laird, and D. B. Rubin, “Maximum likelihood from incomplete data via the EM algorithm,†Journal of the Royal Statistical Society. Series B (methodological), vol. 39, no. 1, pp. 1-38, 1977.
P. Baldi, and Y. Chauvin, “Hybrid modeling, HMM/NN architectures, and protein applications,†Neural Computation, vol. 8, no. 7, pp. 1541-1565, 1996. doi: 10.1162/neco.1996.8.7.1541
A. N. Kolmogorov, “Three approaches to the quantitative definition of information,†Problems of Information Transmission, vol. 1, no. 1, pp. 1-7, 1965. doi: http://dx.doi.org/10.1080/00207166808803030
P. D. Grünwald, I. J. Myung, and M. A. Pitt, Advances in Minimum Description Length: Theory and Applications. MIT Press, 2005.
E. Allender, "Applications of time-bounded Kolmogorov complexity in complexity theory," Kolmogorov complexity and computational complexity. Springer, 1992, pp. 4-22.
J. Schmidhuber, "The speed prior: a new simplicity measure yielding near-optimal computable predictions," Proceedings of the 15th Annual Conference on Computational Learning Theory (COLT 2002), Sydney, Australia, Lecture Notes in Artificial Intelligence. Springer, pp. 216-228.
J. Schmidhuber, “Draft: Deep Learning in Neural Networks: An Overview,†Neural Networks, vol. 61, pp. 85-117, 2014.
J. Rissanen, “Stochastic complexity and modeling,†The Annals of Statistics, vol. 14, no. 3, pp. 1080-1100, 1986.
M. C. P. de Souto, C. Marc'ilio, and W. R. De Oliveira, "The loading problem for pyramidal neural networks".
L. D. Jackel, B. Boser, H. Graf, J. S. Denker, Y. Le Cun, D. Henderson, O. Matan, R. Howard, and K. Baird, "VLSI implementations of electronic neural networks: An example in character recognition," in Systems, Man and Cybernetics 1990. Conference Proceedings, pp. 320-322. doi: 10.1109/ICSMC.1990.142119
B. Widrow, D. E. Rumelhart, and M. A. Lehr, “Neural networks: applications in industry, business, and science,†Communications of the ACM, vol. 37, no. 3, pp. 93-106, 1994.
D. Anguita, and B. A. Gomes, “Mixing floating-and fixed-point formats for neural network learning on neuroprocessors,†Microprocessing and Microprogramming, vol. 41, no. 10, pp. 757-769, 1996. doi: https://doi.org/10.1016/0165-6074(96)00012-9
D. CireÅŸan, U. Meier, J. Masci, and J. Schmidhuber, "A committee of neural networks for traffic sign classification," in Neural Networks (IJCNN), The 2011 International Joint Conference, pp. 1918-1921. doi: 10.1109/IJCNN.2011.6033458.
E. D. Dickmanns, R. Behringer, D. Dickmanns, T. Hildebrandt, M. Maurer, F. Thomanek, and J. Schiehlen," The seeing passenger car'VaMoRs-P'," in Proceedings of the Intelligent Vehicles '94 Symposium, pp. 68-73. doi: 10.1109/IVS.1994.639472
A. G. Ivakhnenko, and V. G. Lapa, "Cybernetic predicting devices," DTIC Document, 1966.
A. G. Ivakhnenko, “The group method of data handling-a rival of the method of stochastic approximation,†Soviet Automatic Control, vol. 1, no. 3, pp. 43-55, 1968.
A. G. Ivakhnenko, “Polynomial theory of complex systems,†IEEE transactions on Systems, Man, and Cybernetics, vol. SMC - 1, no. 4, pp. 364-378, 1971. doi: 10.1109/TSMC.1971.4308320
A. Ivakhnenko, and G. Ivakhnenko, “The review of problems solvable by algorithms of the group method of data handling (GMDH),†Pattern Recognition And Image Analysis C/C Of Raspoznavaniye Obrazov I Analiz Izobrazhenii, vol. 5, no. 4, pp. 527-535, 1995.
T. Kondo, J. Ueno, and S. Takao, “Multi-layered GMDH-type neural network and self-selecting optimum neural network architecture and its application to 3-dimensional medical image recognition of blood vessels,†International Journal of Innovative Computing Information, and Control, vol. 4, no. 1, pp. 175-187, 2008.
K. Fukushima, “Neural Network Model for a Mechanism of Pattern Recognition Unaffected by Shift in Position-Neocognitron,†Electron. & Commun. Japan, vol. 62, no. 10, pp. 11-18, 1979.
K. Fukushima, “Neocognitron--A Self-organizing Neural Network Model for a Mechanism of Pattern Recognition Unaffected by Shift in Position,†NHK, no. 15, pp. 106-115, 1981.
K. Fukushima, “Artificial vision by multi-layered neural networks: Neocognitron and its advances,†Neural Networks, vol. 37, pp. 103-119, 2013. doi: https://doi.org/10.1016/j.neunet.2012.09.016
K. Fukushima, “Increasing robustness against background noise: Visual pattern recognition by a neocognitron,†Neural networks, vol. 24, no. 7, pp. 767-778, 2011. doi: https://doi.org/10.1016/j.neunet.2011.03.017
D. O. Hebb, “The organization of behavior," New York: Wiley, 1949.
M. A. Carreira-Perpinan, “Continuous latent variable models for dimensionality reduction and sequential data reconstruction,†Ph. D. dissertation, Department of Computer Science, University of Sheffield, United Kingdom, 2001.
G. E. Hinton, P. Dayan, B. J. Frey, and R. M. Neal, “The wake-sleep algorithm for unsupervised neural networks,†Science, vol. 268, no. 5214, pp. 1158-1161, 1995.
A. Belouchrani, K. Abed-Meraim, J.-F. Cardoso, and E. Moulines, “A blind source separation technique using second-order statistics,†IEEE Transactions on signal processing, vol. 45, no. 2, pp. 434-444, 1997. doi: 10.1109/78.554307
O. Melnik, S. Levy, and J. Pollack, "RAAM for infinite context-free languages," in Neural Networks, 2000. IJCNN 2000, Proceedings of the IEEE-INNS-ENNS International Joint Conference, 2000, pp. 585-590.
J. E. Fieldsend, and R. M. Everson, "Multiobjective supervised learning," in Multiobjective Problem Solving from Nature, Springer, 2008, pp. 155-176.
B. Schrauwen, D. Verstraeten, and J. Van Campenhout, "An overview of reservoir computing: Theory, applications, and implementations," in Proceedings of the 15th European Symposium on Artificial Neural Networks, pp/ 471-482.
A. A. Hameed, B. Karlik, and M. S. Salman, “Back-propagation algorithm with variable adaptive momentum,†Knowledge-Based Systems, vol. 114, pp. 79-87, 2016. doi: https://doi.org/10.1016/j.knosys.2016.10.001
R. Karimi, F. Yousefi, M. Ghaedi, and K. Dashtian, “Back propagation artificial neural network and central composite design modeling of operational parameter impact for sunset yellow and azur (II) adsorption onto MWCNT and MWCNT-Pd-NPs: Isotherm and kinetic study,†Chemometrics and Intelligent Laboratory Systems, vol. 159, pp. 127-137, 2016. doi: https://doi.org/10.1016/j.chemolab.2016.10.012
L. Wang, Y. Zeng, and T. Chen, “Back propagation neural network with adaptive differential evolution algorithm for time series forecasting,†Expert Systems with Applications, vol. 42, no. 2, pp. 855-863, 2015. doi: https://doi.org/10.1016/j.eswa.2014.08.018
L. P. Kaelbling, M. L. Littman, and A. W. Moore, “Reinforcement learning: A survey,†Journal of Artificial Intelligence Research, vol. 4, pp. 237-285, 1996.
J. Schmidhuber, “Completely self-referential optimal reinforcement learners,†Artificial Neural Networks: Formal Models and Their Applications–ICANN 2005, pp. 223-233, 2005.
D. E. Rumelhart, D. Zipser, “Feature discovery by competitive learning,†Parallel distributed processing, vol. 9, no. 1, pp. 75-112, 1985.
N. Kambhatla, and T. K. Leen, “Dimension reduction by local principal component analysis,†Neural Computation, vol. 9, no. 7, pp. 1493-1516, 1997. doi: 10.1162/neco.1997.9.7.1493
B. Krose and P. Van der Smagt, An introduction to neural networks, 8th Ed. University of Amsterdam, 1996.
P. Foldiak, “Sparse coding in the primate cortex,†in The Handbook of Brain Theory and Neural Networks, 2nd Ed. MIT Press, 2002.
M. Klapper-Rybicka, N. N. Schraudolph, and J. Schmidhuber, "Unsupervised learning in LSTM recurrent neural networks," pp. 684-691.
M. Franzius, H. Sprekeler, and L. Wiskott, “Slowness and sparseness lead to place, head-direction, and spatial-view cells,†PLoS Computational Biology, vol. 3, no. 8, pp. e166, 2007. doi: 10.1371/journal.pcbi.0030166
S. Waydo, and C. Koch, “Unsupervised learning of individuals and categories from images,†Neural Computation, vol. 20, no. 5, pp. 1165-1178, 2008.
Y. Bengio, A. Courville, and P. Vincent, “Representation learning: A review and new perspectives,†IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 8, pp. 1798-1828, 2013. doi: 10.1109/TPAMI.2013.50
H. Sak, A. Senior, and F. Beaufays, "Long short-term memory recurrent neural network architectures for large scale acoustic modeling," pp. 338-342.
A. Graves, M. Liwicki, S. Fernández, R. Bertolami, H. Bunke, and J. Schmidhuber, “A novel connectionist system for unconstrained handwriting recognition,†IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 31, no. 5, pp. 855-868, 2009. doi: 10.1109/TPAMI.2008.137
S. Hochreiter and J. Schmidhuber, “Long short-term memory,†Neural Computation, vol. 9, no. 8, pp. 1735-1780, 1997.
J. A. Pérez-Ortiz, F. A. Gers, D. Eck, and J. Schmidhuber, “Kalman filters improve LSTM network performance in problems unsolvable by traditional recurrent nets,†Neural Networks, vol. 16, no. 2, pp. 241-250, 2003.
J. Bayer, D. Wierstra, J. Togelius, and J. Schmidhuber, “Evolving memory cell structures for sequence learning,†Artificial Neural Networks–ICANN 2009, pp. 755-764, 2009.
F. A. Gers, and J. Schmidhuber, “LSTM recurrent networks learn simple context-free and context-sensitive languages,†IEEE Transactions on Neural Networks, vol. 12, no. 6, pp. 1333-1340, 2001. doi: 10.1109/72.963769
W. M. Kistler, W. Gerstner, and J. L. van Hemmen, “Reduction of the Hodgkin-Huxley equations to a single-variable threshold model,†Neural Computation, vol. 9, no. 5, pp. 1015-1045, 1997. doi: 10.1162/neco.1997.9.5.1015
A. Vahid, and C. W. Omlin, “A machine learning method for extracting symbolic knowledge from recurrent neural networks,†Neural Computation, vol. 16, no. 1, pp. 59-71, 2004.
Z. Zeng, R. M. Goodman, and P. Smyth, “Discrete recurrent neural networks for grammatical inference,†IEEE Transactions on Neural Networks, vol. 5, no. 2, pp. 320-330, 1994. doi: 10.1109/72.279194
G. Sun, C. L. Giles, H. Chen, and Y. Lee, The neural network pushdown automaton: Model, stack and learning simulations, 1998.
P. Rodriguez, J. Wiles, and J. L. Elman, “A recurrent neural network that learns to count,†Connection Science, vol. 11, no. 1, pp. 5-40, 1999.
J. Martens, and I. Sutskever, "Learning recurrent neural networks with Hessian-free optimization," in Proceedings of the 28th International Conference on Machine Learning, Bellevue, WA, USA, 2011, pp. 1033-1040.
E. Marchi, G. Ferroni, F. Eyben, L. Gabrielli, S. Squartini, and B. Schuller, "Multi-resolution linear prediction based features for audio onset detection with bidirectional LSTM neural networks," in IEEE International Conference Acoustics, Speech and Signal Processing (ICASSP), 2014, pp. 2164-2168.
G. Deco, and E. T. Rolls, “Neurodynamics of biased competition and cooperation for attention: A model with spiking neurons,†Journal of Neurophysiology, vol. 94, no. 1, pp. 295-313, 2005. doi: 10.1152/jn.01095.2004
M. Woellmer, B. Schuller, and G. Rigoll, “Keyword spotting exploiting long short-term memory,†Speech Communication, vol. 55, no. 2, pp. 252-265, 2013. doi: https://doi.org/10.1016/j.specom.2012.08.006
J. Schmidhuber, “My first deep learning system of 1991+ deep learning timeline 1962-2013,†arXiv preprint arXiv:1312.5548, 2013.
A. Graves, S. Fernández, F. Gomez, and J. Schmidhuber, "Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks," in Proceeding ICML '06 Proceedings of the 23rd international conference on Machine learning, pp. 369-376.
M. Wollmer, C. Blaschke, T. Schindl, B. Schuller, B. Farber, S. Mayer, and B. Trefflich, “Online driver distraction detection using long-short-term memory,†IEEE Transactions on Intelligent Transportation Systems, vol. 12, no. 2, pp. 574-582, 2011. doi: 10.1109/TITS.2011.2119483
H. Mayer, F. Gomez, D. Wierstra, I. Nagy, A. Knoll, and J. Schmidhuber, “A system for robotic heart surgery that learns to tie knots using recurrent neural networks,†Advanced Robotics, vol. 22, no. 13-14, pp. 1521-1537, 2008.doi: https://doi.org/10.1163/156855308X360604
A. Graves, M. Liwicki, H. Bunke, J. Schmidhuber, and S. Fernández, "Unconstrained on-line handwriting recognition with recurrent neural networks," in Advances in Neural Information Processing Systems 20 (NIPS 2007), pp. 577-584.
I. J. Goodfellow, M. Mirza, D. Xiao, A. Courville, and Y. Bengio, “An empirical investigation of catastrophic forgetting in gradient-based neural networks,†arXiv preprint arXiv:1312.6211, 2013.
H. Sutskever, O. Vinyals, and Q. V. Le, "Sequence to sequence learning with neural networks," NIPS'14 Proceedings of the 27th International Conference on Neural Information Processing Systems - vol. 2, pp. 3104-3112.
R. Brueckner and B. Schulter, "Social signal classification using deep BLSTM recurrent neural networks," 2014 IEEE International Conference Acoustics, Speech and Signal Processing (ICASSP), pp. 4823-4827. doi: 10.1109/ICASSP.2014.6854518
R. Pascanu, C. Gulcehre, K. Cho, and Y. Bengio, “How to construct deep recurrent neural networks,†arXiv:1312.6026 [cs.NE], 2013.
S. Linnainmaa, “The representation of the cumulative rounding error of an algorithm as a Taylor expansion of the local rounding errors,†Master's Thesis (in Finnish), Univ. Helsinki, pp. 6-7, 1970.
S. Linnainmaa, “Taylor expansion of the accumulated rounding error,†BIT Numerical Mathematics, vol. 16, no. 2, pp. 146-160, 1976.
L. Bobrowski, “Learning processes in multilayer threshold nets,†Biological Cybernetics, vol. 31, no. 1, pp. 1-6, 1978.
P. J. Werbos, "Applications of advances in nonlinear sensitivity analysis," System Modeling and optimization, pp. 762-770: Springer, berlin, 1982.
D. E. Rumelhart, and D. Zipser, “Feature discovery by competitive learning,†Cognitive Science, vol. 9, no. 1, pp. 75-112, 1985. doi: https://doi.org/10.1016/S0364-0213(85)80010-0
M. Gherrity, "A learning algorithm for analog, fully recurrent neural networks," in International Joint Conference Neural Networks,1989. IJCNN., doi: 10.1109/IJCNN.1989.118645
Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, and L. D. Jackel, “Backpropagation applied to handwritten zip code recognition,†Neural Computation, vol. 1, no. 4, pp. 541-551, 1989. doi: 10.1162/neco.1989.1.4.541
Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,†Proceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324, 1998.
P. Baldi, and Y. Chauvin, “Neural networks for fingerprint recognition,†Neural Computation, vol. 5, no. 3, pp. 402-418, 1993.
B. Karlik and C. Bayar, “Differentiating type of muscle movement via AR modeling and neural network classification,†Turkish Journal Of Electrical Engineering & Computer Sciences, vol. 7, no. 1-3, pp. 45-52, 1999.
W. K. Chiang, D. Zhang, and L. Zhou, “Predicting and explaining patronage behavior toward web and traditional stores using neural networks: a comparative analysis with logistic regression,†Decision Support Systems, vol. 41, no. 2, pp. 514-531, 2006. doi: https://doi.org/10.1016/j.dss.2004.08.016
M. Behrman, R. Linder, A. H. Assadi, B. R. Stacey, and M.-M. Backonja, “Classification of patients with pain based on neuropathic pain symptoms: Comparison of an artificial neural network against an established scoring system,†European Journal of Pain, vol. 11, no. 4, pp. 370-376, 2007. doi: https://doi.org/10.1016/j.ejpain.2006.03.001
D. E. Williams, G. E. Hinton, and R. J. Williams “Learning representations by back-propagating errors,†Nature, vol. 323, no. 9, pp. 533-536, 1986.
J. Hadamard, "Mémoire sur le problème d'analyse relatives à l'équilibre des plaques élastiques encastrées: Imprimerie Nationale," 1908.
L. S. Pontryagin, V. G. Boltyanskii, R. V. Gamkrelidze, and E. F. Mishchenko, “The mathematical theory of optimal processes - International series of monographs in pure and applied mathematics,â€, New York, USA: Interscience, 1962.
S. Director, and R. Rohrer, “Automated network design-the frequency-domain case,†IEEE Transactions on Circuit Theory, vol. 16, no. 3, pp. 330-337, 1969. doi: 10.1109/TCT.1969.1082967
H. J. Kelley, “Gradient theory of optimal flight paths,†Ars Journal, vol. 30, no. 10, pp. 947-954, 1960. doi: https://doi.org/10.2514/8.5282
S. Dreyfus, “The computational solution of optimal control problems with time lag,†IEEE Transactions on Automatic Control, vol. 18, no. 4, pp. 383-385, 1973. doi: 10.1109/TAC.1973.1100330
B. Speelpenning, "Compiling fast partial derivatives of functions given by algorithms," University of Illinois at Urbana (USA). Dept. of Computer Science, IL, USA, 1980.
D. Parker, "Learning logic: technical report TR-87, Center for Computational Research in Economics and Management Science," The MIT Press, Cambridge (Mass), 1985.
D. Fletcher, and E. Goss, “Forecasting with neural networks: an application using bankruptcy data,†Information & Management, vol. 24, no. 3, pp. 159-167, 1993. doi: https://doi.org/10.1016/0378-7206(93)90064-Z
R. J. Williams, "Complexity of exact gradient computation algorithms for recurrent neural networks Technical Report Technical Report NU-CCS-89-27". Boston, Northeastern University, College of Computer Science, Boston, USA, 1989.
B. A. Pearlmutter, “Gradient calculations for dynamic recurrent neural networks: A survey,†IEEE Transactions on Neural Networks, vol. 6, no. 5, pp. 1212-1228, 1995.
A. F. Atiya, and A. G. Carlos, “New results on recurrent network training: unifying the algorithms and accelerating convergence,†IEEE Transactions on Neural Networks, vol. 11, no. 3, pp. 697-709, 2000. doi: 10.1109/72.846741
S. Amari, “Natural gradient works efficiently in learning,†in Neural Computation, vol. 10, no. 2, , MIT Press: CA, USA. pp. 251-276, 1998. doi: 10.1162/089976698300017746
M. A. Mohammed, Z. H. Salih, N. Ţăpuş, & R. A. K. Hasan, "Security and accountability for sharing the data stored in the cloud," in 15th RoEduNet Conference: Networking in Education and Research," 2016. doi: 10.1109/RoEduNet.2016.7753201
M. A. Mohammed, N. Tăpuş, & R. A. K. Hasan, A. H. Omer, "A comprehensive study: Ant Colony Optimization (ACO) for Facility Layout Problem," in 16th RoEduNet Conference: Networking in Education and Research, 2017.