Sepp Hochreiter

Sepp Hochreiter
Sepp Hochreiter
Born	February 14, 1967 (age 51); Mühldorf, West Germany
Residence	Austria
Nationality	German
Alma mater	Technische Universität München
	Scientific career
Fields	Machine learning, bioinformatics
Institutions	Johannes Kepler University Linz
Website	homepage

Sepp Hochreiter (born Josef Hochreiter in 1967) is a German computer scientist. Since 2018 he is heading the Institute for Machine Learning at the Johannes Kepler University of Linz after having led the Institute of Bioinformatics from 2006 to 2018. Since 2017 he is also head of the Linz Institute of Technology (LIT) AI Lab which focuses on advancing research on artificial intelligence. Previously, he was at the Technical University of Berlin, at the University of Colorado at Boulder, and at the Technical University of Munich.

Sepp Hochreiter has made numerous contributions in the fields of machine learning, deep learning and bioinformatics. He developed the long short-term memory (LSTM) for which the first results were reported in his diploma thesis in 1991.^[1] The main LSTM paper appeared in 1997^[2] and is considered as a discovery that is a milestone in the timeline of machine learning. The foundation of deep learning were led by his analysis of the vanishing or exploding gradient.^[1]^[3]^[4] He contributed to meta learning^[5] and proposed flat minima^[6] as preferable solutions of learning artificial neural networks to ensure a low generalization error. He developed new activation functions for neural networks like exponential linear units (ELUs)^[7] or scaled ELUs (SELUs)^[8]^[9] to improve learning. He contributed to reinforcement learning via actor-critic approaches^[10] and his RUDDER method.^[11] He applied biclustering methods to drug discovery and toxicology. He extended support vector machines to handle kernels that are not positive definite with the "Potential Support Vector Machine" (PSVM) model, and applied this model to feature selection, especially to gene selection for microarray data.^[12] Also in biotechnology, he developed "Factor Analysis for Robust Microarray Summarization" (FARMS).^[13]

In addition to his research contributions, Sepp Hochreiter is broadly active within his field: he launched the Bioinformatics Working Group at the Austrian Computer Society; he is founding board member of different bioinformatics start-up companies; he was program chair of the conference Bioinformatics Research and Development;^[14] he is a conference chair of the conference Critical Assessment of Massive Data Analysis (CAMDA); and he is editor, program committee member, and reviewer for international journals and conferences. As a faculty member at Johannes Kepler Linz, he founded the Bachelors Program in Bioinformatics, which is a cross-border, double-degree study program together with the University of South-Bohemia in České Budějovice (Budweis), Czech Republic. He also established the Masters Program in Bioinformatics, where he is still the acting dean of both studies.

Scientific contributions[edit]

Long short-term memory (LSTM)[edit]

Sepp Hochreiter developed the long short-term memory (LSTM) for which the first results were reported in his diploma thesis in 1991.^[1] The main LSTM paper appeared in 1997^[2] and is considered as a discovery that is a milestone in the timeline of machine learning. LSTM overcomes the problem of recurrent neural networks (RNNs) and deep networks to forget information over time or, equivalently, through layers (vanishing or exploding gradient).^[1]^[3]^[4] LSTM learns from training sequences to process new sequences in order to produce an output (sequence classification) or generate an output sequence (sequence to sequence mapping). Neural networks with LSTM cells solved numerous tasks in biological sequence analysis, drug design, automatic music composition, machine translation, speech recognition, reinforcement learning, and robotics. LSTM with an optimized architecture was successfully applied to very fast protein homology detection without requiring a sequence alignment.^[15] LSTM has been used to learn a learning algorithm, that is, LSTM serves as a Turing machine, i.e. as a computer, on which a learning algorithm is executed. Since the LSTM Turing machine is a neural network, it can develop novel learning algorithms by learning on learning problems. It turns out that the learned new learning techniques are superior to those designed by humans.^[16] LSTM networks are used in Google Voice transcription,^[17] Google voice search,^[18] and Google's Allo^[19] as core technology for voice searches and commands in the Google App (on Android and iOS), and for dictation on Android devices. Also Apple has used LSTM in their "Quicktype" function since iOS 10.^[20]^[21]

Deep learning and learning representations[edit]

Neural networks are different types of simplified mathematical models of biological neural networks like those in human brains. In feedforward neural networks (NNs) the information moves forward in only one direction, from the input layer that receives information from the environment, through the hidden layers to the output layer that supplies the information to the environment. Unlike NNs, recurrent neural networks (RNNs) can use their internal memory to process arbitrary sequences of inputs. If data mining is based on neural networks, overfitting reduces the network's capability to correctly process future data. To avoid overfitting, Sepp Hochreiter developed algorithms for finding low complexity neural networks like "Flat Minimum Search" (FMS),^[6] which searches for a "flat" minimum — a large connected region in the parameter space where the network function is constant. Thus, the network parameters can be given with low precision which means a low complex network that avoids overfitting. Low complexity neural networks are well suited for deep learning because they control the complexity in each network layer and, therefore, learn hierarchical representations of the input.^[22]^[23] Sepp Hochreiter's group introduced "exponential linear units" (ELUs) which speed up learning in deep neural networks and lead to higher classification accuracies. Like rectified linear units (ReLUs), leaky ReLUs (LReLUs), and parametrized ReLUs (PReLUs), ELUs alleviate the vanishing gradient problem via the identity for positive values. However, ELUs have improved learning characteristics compared to ReLUs, due to negative values which push mean unit activations closer to zero. Mean shifts toward zero speed up learning by bringing the normal gradient closer to the unit natural gradient because of a reduced bias shift effect.^[24] Sepp Hochreiter introduced self-normalizing neural networks (SNNs) which allow for feedforward networks abstract representations of the input on different levels. SNNs avoid problems of batch normalization since the activations across samples automatically converge to mean zero and variance one. SNNs an enabling technology to (1) train very deep networks, that is, networks with many layers, (2) use novel regularization strategies, and (3) learn very robustly across many layers.^[8]^[9] In unsupervised deep learning, Generative Adversarial Networks (GANs) are very popular since they create new images which are more realistic than those of obtained from other generative approaches. Sepp Hochreiter proposed a two time-scale update rule (TTUR) for learning GANs with stochastic gradient descent on any differentiable loss function. Methods from stochastic approximation have been used to prove that the TTUR converges to a stationary local Nash equilibrium. This is the first proof of the convergence of GANs in a general setting. Another contribution is the introduction of the "Fréchet Inception Distance" (FID) which is a more appropriate quality measure for GANs than the previously used Inception Score.^[25]^[26] He developed rectified factor networks (RFNs)^[27]^[28] to efficiently construct very sparse, non-linear, high-dimensional representations of the input. RFN models identify rare and small events in the input, have a low interference between code units, have a small reconstruction error, and explain the data covariance structure. RFN learning is a generalized alternating minimization algorithm derived from the posterior regularization method which enforces non-negative and normalized posterior means. RFN were very successfully applied in bioinformatics and genetics.^[29]

Reinforcement Learning[edit]

Sepp Hochreiter worked in the field of reinforcement learning on actor-critic systems that learn by "backpropagation through a model".^[10]^[30] However this approach has major drawbacks stemming from sensitivity analysis like local minima, various instabilities when learning online, exploding and vanishing gradients of world model, neither contribution nor relevance for the reward is assigned to actions. Sepp Hochreiter introduced "RUDDER: Return Decomposition for Delayed Rewards" which is designed to learn optimal policies for Markov Decision Processes (MDPs) with highly delayed rewards. For delayed rewards, he proved that the biases of action-value estimates learned by temporal difference (TD) are corrected only exponentially slowly in the number of delay steps. Furthermore, he proved that the variance of an action-value estimate that is learned via Monte Carlo methods (MC) increases other estimation variances, the number of which can grow exponentially with the number of delay steps. RUDDER solves both the exponentially slow bias correction of TD and the increase of exponentially many variances of MC by a return decomposition. A new RUDDER-constructed MDP has the same return for each episode and policy as the original MDP but the rewards are redistributed along the episode. The redistribution leads to largely reduced delays of the rewards. In the optimal case, the new MDP has no delayed rewards and TD is unbiased. The redistributed rewards aim to track Q-values in order to keep the future expected reward always at zero. Therefore, an action that increases the expected return receives a positive reward and an action that decreased the expected return receives a negative reward. RUDDER consists of (I) a safe exploration strategy, (II) a lessons replay buffer, and (III) an LSTM-based reward redistribution method via return decomposition and backward contribution analysis.^[11] Both source code and demonstration videos are available. The exploration can be improved by active exploration strategies that maximize the information gain of future episodes which is often associated with curiosity.^[31]

Drug discovery, target prediction, and toxicology[edit]

The pharma industry sees many chemical compounds (drug candidates) fail in late phases of the drug development pipeline. These failures are caused by insufficient efficacy on the biomolecular target (on-target effect), undesired interactions with other biomolecules (off-target or side effects), or unpredicted toxic effects. The Deep Learning and biclustering methods developed by Sepp Hochreiter identified novel on- and off-target effects in various drug design projects.^[32] In 2013 Sepp Hochreiter's group won the DREAM subchallenge of predicting the average toxicity of compounds.^[33] In 2014 this success with Deep Learning was continued by winning the "Tox21 Data Challenge" of NIH, FDA and NCATS.^[34]^[35] The goal of the Tox21 Data Challenge was to correctly predict the off-target and toxic effects of environmental chemicals in nutrients, household products and drugs. These impressive successes show Deep Learning may be superior to other virtual screening methods.^[36]^[37] Furthermore, Hochreiter's group worked on identifying synergistic effects of drug combinations.^[38]

Biclustering[edit]

Sepp Hochreiter developed "Factor Analysis for Bicluster Acquisition" (FABIA)^[39] for biclustering that is simultaneously clustering rows and columns of a matrix. A bicluster in transcriptomic data is a pair of a gene set and a sample set for which the genes are similar to each other on the samples and vice versa. In drug design, for example, the effects of compounds may be similar only on a subgroup of genes. FABIA is a multiplicative model that assumes realistic non-Gaussian signal distributions with heavy tails and utilizes well understood model selection techniques like a variational approach in the Bayesian framework. FABIA supplies the information content of each bicluster to separate spurious biclusters from true biclusters. Sepp Hochreiter edited the reference book on biclustering which presents the most relevant biclustering algorithms, typical applications of biclustering, visualization and evaluation of biclusters, and software in R.^[40]

Support vector machines[edit]

Support vector machines (SVMs) are supervised learning methods used for classification and regression analysis by recognizing patterns and regularities in the data. Standard SVMs require a positive definite kernel to generate a squared kernel matrix from the data. Sepp Hochreiter proposed the "Potential Support Vector Machine" (PSVM),^[41] which can be applied to non-square kernel matrices and can be used with kernels that are not positive definite. For PSVM model selection he developed an efficient sequential minimal optimization algorithm.^[42] The PSVM minimizes a new objective which ensures theoretical bounds on the generalization error and automatically selects features which are used for classification or regression.

Feature selection[edit]

Sepp Hochreiter applied the PSVM to feature selection, especially to gene selection for microarray data.^[12]^[43]^[44] The PSVM and standard support vector machines were applied to extract features that are indicative coiled coil oligomerization.^[45]

Genetics[edit]

Sepp Hochreiter developed "HapFABIA: Identification of very short segments of identity by descent characterized by rare variants in large sequencing data"^[46] for detecting short segments of identity by descent. A DNA segment is identical by state (IBS) in two or more individuals if they have identical nucleotide sequences in this segment. An IBS segment is identical by descent (IBD) in two or more individuals if they have inherited it from a common ancestor, that is, the segment has the same ancestral origin in these individuals. HapFABIA identifies 100 times smaller IBD segments than current state-of-the-art methods: 10kbp for HapFABIA vs. 1Mbp for state-of-the-art methods. HapFABIA is tailored to next generation sequencing data and utilizes rare variants for IBD detection but also works for microarray genotyping data. HapFABIA allows to enhance evolutionary biology, population genetics, and association studies because it decomposed the genome into short IBD segments which describe the genome with very high resolution. HapFABIA was used to analyze the IBD sharing between Humans, Neandertals (Neanderthals), and Denisovans.^[47]

Next-generation sequencing[edit]

Sepp Hochreiter's research group is member of the SEQC/MAQC-III consortium, coordinated by the US Food and Drug Administration. This consortium examined Illumina HiSeq, Life Technologies SOLiD and Roche 454 platforms at multiple laboratory sites regarding RNA sequencing (RNA-seq) performance.^[48] Within this project standard approaches to assess, report and compare the technical performance of genome-scale differential gene expression experiments have been defined.^[49] For analyzing the structural variation of the DNA, Sepp Hochreiter's research group proposed "cn.MOPS: mixture of Poissons for discovering copy number variations in next-generation data with a low false discovery rate"^[50] for detecting copy number variations in next generation sequencing data. cn.MOPS estimates the local DNA copy number, is suited for both whole genome sequencing and exom sequencing, and can be applied to diploid and haploid genomes but also to polyploid genomes. For identifying differential expressed transcripts in RNA-seq (RNA sequencing) data, Sepp Hochreiter's group suggested "DEXUS: Identifying Differential Expression in RNA-Seq Studies with Unknown Conditions".^[51] In contrast to other RNA-seq methods, DEXUS can detect differential expression in RNA-seq data for which the sample conditions are unknown and for which biological replicates are not available. In the group of Sepp Hochreiter, sequencing data was analyzed to gain insights into chromatin remodeling. The reorganization of the cell's chromatin structure was determined via next-generation sequencing of resting and activated T cells. The analyses of these T cell chromatin sequencing data identified GC-rich long nucleosome-free regions that are hot spots of chromatin remodeling.^[52] For targeted next-generation-sequencing panels in clinical diagnostics, in particular for cancer, Hochreiter's group developed panelcn.MOPS.^[53]

Microarray preprocessing and summarization[edit]

Sepp Hochreiter developed "Factor Analysis for Robust Microarray Summarization" (FARMS).^[13] FARMS has been designed for preprocessing and summarizing high-density oligonucleotide DNA microarrays at probe level to analyze RNA gene expression. FARMS is based on a factor analysis model which is optimized in a Bayesian framework by maximizing the posterior probability. On Affymetrix spiked-in and other benchmark data, FARMS outperformed all other methods. A highly relevant feature of FARMS is its informative/ non-informative (I/NI) calls.^[54] The I/NI call is a Bayesian filtering technique which separates signal variance from noise variance. The I/NI call offers a solution to the main problem of high dimensionality when analyzing microarray data by selecting genes which are measured with high quality.^[55]^[56] FARMS has been extended to cn.FARMS^[57] for detecting DNA structural variants like copy number variations with a low false discovery rate.

References[edit]

^ ^a ^b ^c ^d Hochreiter, S. (1991). Untersuchungen zu dynamischen neuronalen Netzen (PDF) (diploma thesis). Technical University Munich, Institute of Computer Science.
^ ^a ^b Hochreiter, S.; Schmidhuber, J. (1997). "Long Short-Term Memory". Neural Computation. 9 (8): 1735–1780. doi:10.1162/neco.1997.9.8.1735. PMID 9377276.
^ ^a ^b Hochreiter, S. (1998). "The Vanishing Gradient Problem During Learning Recurrent Neural Nets and Problem Solutions". International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems. 06 (02): 107–116. doi:10.1142/S0218488598000094. ISSN 0218-4885.
^ ^a ^b Hochreiter, S.; Bengio, Y.; Frasconi, P.; Schmidhuber, J. (2000). Kolen, J. F.; Kremer, S. C., eds. Gradient flow in recurrent nets: the difficulty of learning long-term dependencies. A Field Guide to Dynamical Recurrent Networks. New York City: IEEE Press. pp. 237–244.
^ Hochreiter, S.; Younger, A. S.; Conwell, P. R. (2001). "Learning to Learn Using Gradient Descent" (PDF). Lecture Notes in Computer Science - ICANN 2001: 87–94. doi:10.1007/3-540-44668-0_13. ISSN 0302-9743.
^ ^a ^b Hochreiter, S.; Schmidhuber, J. (1997). "Flat Minima". Neural Computation. 9 (1): 1–42. doi:10.1162/neco.1997.9.1.1. PMID 9117894.
^ Clevert, D.-A.; Unterthiner, T.; Hochreiter, S. (2016). "Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) [published as a conference paper at ICLR 2016]". arXiv:1511.07289v5 [cs.LG].
^ ^a ^b Klambauer, G.; Unterthiner, T.; Mayr, A.; Hochreiter, S. (2017). "Self-Normalizing Neural Networks". arXiv:1706.02515 [cs.LG].
^ ^a ^b Klambauer, G.; Unterthiner, T.; Mayr, A.; Hochreiter, S. (2017). Self-Normalizing Neural Networks. Advances in Neural Information Processing Systems 31.
^ ^a ^b Hochreiter, S. (1991). Implementierung und Anwendung eines neuronalen Echtzeit-Lernalgorithmus für reaktive Umgebungen (PDF) (Report). Technical University Munich, Institute of Computer Science.
^ ^a ^b Arjona-Medina, J. A.; Gillhofer, M.; Widrich, M.; Unterthiner, T.; Hochreiter, S. (2018). "RUDDER: Return Decomposition for Delayed Rewards". arXiv:1806.07857 [cs.LG].
^ ^a ^b Hochreiter, S.; Obermayer, K. (2006). "Nonlinear Feature Selection with the Potential Support Vector Machine". Feature Extraction, Studies in Fuzziness and Soft Computing: 419–438. doi:10.1007/978-3-540-35488-8_20. ISBN 978-3-540-35487-1.
^ ^a ^b Hochreiter, S.; Clevert, D.-A.; Obermayer, K. (2006). "A new summarization method for affymetrix probe level data". Bioinformatics. 22 (8): 943–949. doi:10.1093/bioinformatics/btl033. PMID 16473874.
^ Hochreiter, S.; Wagner, R. (2007). "Bioinformatics Research and Development". doi:10.1007/978-3-540-71233-6. ISSN 0302-9743.
^ Hochreiter, S.; Heusel, M.; Obermayer, K. (2007). "Fast model-based protein homology detection without alignment". Bioinformatics. 23 (14): 1728–1736. doi:10.1093/bioinformatics/btm247. PMID 17488755.
^ Hochreiter, S.; Younger, A. S.; Conwell, P. R. (2001). "Learning to Learn Using Gradient Descent" (PDF). Lecture Notes in Computer Science - ICANN 2001: 87–94. doi:10.1007/3-540-44668-0_13. ISSN 0302-9743.
^ "The neural networks behind Google Voice transcription".
^ "Google voice search: faster and more accurate".
^ "Chat Smarter with Allo".
^ "Apple's Machines Can Learn Too". The Information.
^ Ranger, Steve. "iPhone, AI and big data: Here's how Apple plans to protect your privacy - ZDNet".
^ Hochreiter, S.; Schmidhuber, J. (1999). "Feature Extraction Through LOCOCODE". Neural Computation. 11 (3): 679–714. doi:10.1162/089976699300016629. ISSN 0899-7667.
^ Hochreiter, S.; Schmidhuber, J. (1999). Source Separation as a By-product of Regularization. Advances in Neural Information Processing Systems 12. pp. 459–465.
^ Clevert, D.-A.; Unterthiner, T.; Hochreiter, S. (2016). "Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) [published as a conference paper at ICLR 2016]". arXiv:1511.07289v5 [cs.LG].
^ Heusel, M.; Ramsauer, H.; Unterthiner, T.; Nessler, B.; Klambauer, G.; Hochreiter, S. (2017). "GANs Trained by a Two Time-Scale Update Rule Converge to a local Nash Equilibrium". arXiv:1706.08500 [cs.LG].
^ Heusel, M.; Ramsauer, H.; Unterthiner, T.; Nessler, B.; Klambauer, G.; Hochreiter, S. (2017). GANs Trained by a Two Time-Scale Update Rule Converge to a local Nash Equilibrium. Advances in Neural Information Processing Systems 31.
^ Clevert, D.-A.; Mayr, A.; Unterthiner, T.; Hochreiter, S. (2015). "Rectified Factor Networks". arXiv:1502.06464v2 [cs.LG].
^ Clevert, D.-A.; Mayr, A.; Unterthiner, T.; Hochreiter, S. (2015). Rectified Factor Networks. Advances in Neural Information Processing Systems 29.
^ Clevert, D.-A.; Unterthiner, T.; Povysil, G.; Hochreiter, S. (2017). "Rectified factor networks for biclustering of omics data". Bioinformatics. 33 (14): i59–i66. doi:10.1093/bioinformatics/btx226.
^ Schmidhuber, J. (1990). Making the world differentiable: On Using Fully Recurrent Self-Supervised Neural Networks for Dynamic Reinforcement Learning and Planning in Non-Stationary Environments (PDF) (Technical report). Technical University Munich, Institute of Computer Science. FKI-126-90 (revised).
^ Storck, J.; Hochreiter, S.; Schmidhuber, J. (1995). Reinforcement driven information acquisition in non-deterministic environments (PDF). International Conference on Artificial Neural Networks. pp. 159–164.
^ Verbist, B.; Klambauer, G.; Vervoort, L.; Talloen, W.; Shkedy, Z.; Thas, O.; Bender, A.; Göhlmann, H.W.H.; Hochreiter, S. (2015). "Using transcriptomics to guide lead optimization in drug discovery projects: Lessons learned from the QSTAR project". Drug Discovery Today. 20 (5): 505–513. doi:10.1016/j.drudis.2014.12.014. ISSN 1359-6446. PMID 25582842.
^ Eduati, F.; Mangravite, L. M.; Wang, T.; ...; Hochreiter, S.; ...; Stolovitzky, G.; Xie, Y.; Saez-Rodriguez, J. (2015). "Prediction of human population responses to toxic compounds by a collaborative competition". Nature Biotechnology. 33 (9): 933–940. doi:10.1038/nbt.3299. ISSN 1087-0156.
^ ""Toxicology in the 21st century Data Challenge"".
^ Mayr, A.; Klambauer, G.; Unterthiner, T.; Hochreiter, S. (2016). "DeepTox: Toxicity Prediction using Deep Learning". Frontiers in Environmental Science. 3 (80). doi:10.3389/fenvs.2015.00080.
^ Unterthiner, T.; Mayr, A.; Klambauer, G.; Steijaert, M.; Ceulemans, H.; Wegner, J. K.; & Hochreiter, S. (2014) "Deep Learning as an Opportunity in Virtual Screening". Workshop on Deep Learning and Representation Learning (NIPS2014).
^ Unterthiner, T.; Mayr, A.; Klambauer, G.; & Hochreiter, S. (2015) "Toxicity Prediction using Deep Learning". ArXiv, 2015.
^ Preuer, K.; Lewis, R. P. I.; Hochreiter, S.; Bender, A.; Bulusu, K. C.; Klambauer, G. (2017). "DeepSynergy: predicting anti-cancer drug synergy with Deep Learning". Bioinformatics. doi:10.1093/bioinformatics/btx806.
^ Hochreiter, S.; Bodenhofer, U.; Heusel, M.; Mayr, A.; Mitterecker, A.; Kasim, A.; Khamiakova, T.; Van Sanden, S.; Lin, D.; Talloen, W.; Bijnens, L.; Göhlmann, H. W. H.; Shkedy, Z.; Clevert, D.-A. (2010). "FABIA: Factor analysis for bicluster acquisition". Bioinformatics. 26 (12): 1520–1527. doi:10.1093/bioinformatics/btq227. PMC 2881408. PMID 20418340.
^ Kasim, A.; Shkedy, Z.; Kaiser, S.; Hochreiter, S.; Talloen, W. (2016). Applied Biclustering Methods for Big and High-Dimensional Data Using R. Chapman & Hall/CRC Biostatistics Series. New York: Taylor & Francis Group, Chapman & Hall. ISBN 9781482208238.
^ Hochreiter, S.; Obermayer, K. (2006). "Support Vector Machines for Dyadic Data". Neural Computation. 18 (6): 1472–1510. doi:10.1162/neco.2006.18.6.1472. PMID 16764511.
^ Knebel, T.; Hochreiter, S.; Obermayer, K. (2008). "An SMO Algorithm for the Potential Support Vector Machine". Neural Computation. 20 (1): 271–287. doi:10.1162/neco.2008.20.1.271. PMID 18045009.
^ Hochreiter, S.; Obermayer, K. (2003). "Classification and Feature Selection on Matrix Data with Application to Gene-Expression Analysis". 54th Session of the International Statistical Institute. Archived from the original on 2012-03-25.
^ Hochreiter, S.; Obermayer, K. (2004). "Gene Selection for Microarray Data". Kernel Methods in Computational Biology. MIT Press: 319–355. Archived from the original on 2012-03-25.
^ Mahrenholz, C. C.; Abfalter, I. G.; Bodenhofer, U.; Volkmer, R.; Hochreiter, S. (2011). "Complex Networks Govern Coiled-Coil Oligomerization - Predicting and Profiling by Means of a Machine Learning Approach". Molecular & Cellular Proteomics. 10 (5): M110.004994–M110.004994. doi:10.1074/mcp.M110.004994. PMC 3098589. PMID 21311038.
^ Hochreiter, S. (2013). "HapFABIA: Identification of very short segments of identity by descent characterized by rare variants in large sequencing data". Nucleic Acids Research. 41 (22): e202. doi:10.1093/nar/gkt1013. PMC 3905877. PMID 24174545.
^ Povysil, G.; Hochreiter, S. (2014). "Sharing of Very Short IBD Segments between Humans, Neandertals, and Denisovans". bioRxiv 003988.
^ "A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the Sequencing Quality Control Consortium". Nature Biotechnology. 32 (9): 903–914. September 2014. doi:10.1038/nbt.2957. PMC 4321899. PMID 25150838.
^ S. A. Munro, S. P. Lund, P. S. Pine, H. Binder, D.-A. Clevert, A. Conesa, J. Dopazo, M. Fasold, S. Hochreiter, H. Hong, N. Jafari, D. P. Kreil, P. P. Labaj, S. Li, Y. Liao, S. M. Lin, J. Meehan, C. E. Mason, J. Santoyo-Lopez, R. A. Setterquist, L. Shi, W. Shi, G. K. Smyth, N. Stralis-Pavese, Z. Su, W. Tong, C. Wang, J. Wang, J. Xu, Z. Ye, Y. Yang, Y. Yu & M. Salit (2014). "Assessing technical performance in differential gene expression experiments with external spike-in RNA control ratio mixtures". Nature Communications. 5: 5125. arXiv:1406.4893. Bibcode:2014NatCo...5E5125M. doi:10.1038/ncomms6125. PMID 25254650.CS1 maint: Multiple names: authors list (link)
^ Klambauer, G.; Schwarzbauer, K.; Mayr, A.; Clevert, D.-A.; Mitterecker, A.; Bodenhofer, U.; Hochreiter, S. (2012). "Cn.MOPS: Mixture of Poissons for discovering copy number variations in next-generation sequencing data with a low false discovery rate". Nucleic Acids Research. 40 (9): e69. doi:10.1093/nar/gks003. PMC 3351174. PMID 22302147.
^ Klambauer, G.; Unterthiner, T.; Hochreiter, S. (2013). "DEXUS: Identifying differential expression in RNA-Seq studies with unknown conditions". Nucleic Acids Research. 41 (21): e198. doi:10.1093/nar/gkt834. PMC 3834838. PMID 24049071.
^ Schwarzbauer, K.; Bodenhofer, U.; Hochreiter, S. (2012). Campbell, Moray, ed. "Genome-wide chromatin remodeling identified at GC-rich long nucleosome-free regions". PLOS ONE. 7 (11): e47924. Bibcode:2012PLoSO...747924S. doi:10.1371/journal.pone.0047924. PMC 3489898. PMID 23144837.
^ Povysil, G.; Tzika, A.; Vogt, J.; Haunschmid, V.; Haunschmid, L.; Zschocke, J.; Klambauer, G.; Hochreiter, S.; Wimmer, K. "panelcn.MOPS: Copy number detection in targeted NGS panel data for clinical diagnostics". Human Mutation. 38 (7): 889–897. doi:10.1002/humu.23237.
^ Talloen, W.; Clevert, D.-A.; Hochreiter, S.; Amaratunga, D.; Bijnens, L.; Kass, S.; Gohlmann, H. W. H. (2007). "I/NI-calls for the exclusion of non-informative genes: A highly effective filtering tool for microarray data". Bioinformatics. 23 (21): 2897–2902. doi:10.1093/bioinformatics/btm478. PMID 17921172.
^ Talloen, W.; Hochreiter, S.; Bijnens, L.; Kasim, A.; Shkedy, Z.; Amaratunga, D.; Gohlmann, H. (2010). "Filtering data from high-throughput experiments based on measurement reliability". Proceedings of the National Academy of Sciences. 107 (46): E173–E174. Bibcode:2010PNAS..107E.173T. doi:10.1073/pnas.1010604107. PMC 2993399. PMID 21059952.
^ Kasim, A.; Lin, D.; Van Sanden, S.; Clevert, D.-A.; Bijnens, L.; Göhlmann, H.; Amaratunga, D.; Hochreiter, S.; Shkedy, Z.; Talloen, W. (2010). "Informative or Noninformative Calls for Gene Expression: A Latent Variable Approach". Statistical Applications in Genetics and Molecular Biology. 9. doi:10.2202/1544-6115.1460.
^ Clevert, D.-A.; Mitterecker, A.; Mayr, A.; Klambauer, G.; Tuefferd, M.; De Bondt, A. D.; Talloen, W.; Göhlmann, H.; Hochreiter, S. (2011). "Cn.FARMS: A latent variable model to detect copy number variations in microarray data with a low false discovery rate". Nucleic Acids Research. 39 (12): e79. doi:10.1093/nar/gkr197. PMC 3130288. PMID 21486749.

Sources[edit]

External links[edit]

[thesis-1] Hochreiter, S. (1991). Untersuchungen zu dynamischen neuronalen Netzen (PDF) (diploma thesis). Technical University Munich, Institute of Computer Science.

[Neco-2] Hochreiter, S.; Schmidhuber, J. (1997). "Long Short-Term Memory". Neural Computation. 9 (8): 1735–1780. doi:10.1162/neco.1997.9.8.1735. PMID 9377276.

[Hochreiter:1998-3] Hochreiter, S. (1998). "The Vanishing Gradient Problem During Learning Recurrent Neural Nets and Problem Solutions". International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems. 06 (02): 107–116. doi:10.1142/S0218488598000094. ISSN 0218-4885.

[Hochreiter:2000book-4] Hochreiter, S.; Bengio, Y.; Frasconi, P.; Schmidhuber, J. (2000). Kolen, J. F.; Kremer, S. C., eds. Gradient flow in recurrent nets: the difficulty of learning long-term dependencies. A Field Guide to Dynamical Recurrent Networks. New York City: IEEE Press. pp. 237–244.

[5] Hochreiter, S.; Younger, A. S.; Conwell, P. R. (2001). "Learning to Learn Using Gradient Descent" (PDF). Lecture Notes in Computer Science - ICANN 2001: 87–94. doi:10.1007/3-540-44668-0_13. ISSN 0302-9743.

[Flat_Minima-6] Hochreiter, S.; Schmidhuber, J. (1997). "Flat Minima". Neural Computation. 9 (1): 1–42. doi:10.1162/neco.1997.9.1.1. PMID 9117894.

[7] Clevert, D.-A.; Unterthiner, T.; Hochreiter, S. (2016). "Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) [published as a conference paper at ICLR 2016]". arXiv:1511.07289v5 [cs.LG].

[Self-Normalizing_Neural_Networks-8] Klambauer, G.; Unterthiner, T.; Mayr, A.; Hochreiter, S. (2017). "Self-Normalizing Neural Networks". arXiv:1706.02515 [cs.LG].

[ReferenceA-9] Klambauer, G.; Unterthiner, T.; Mayr, A.; Hochreiter, S. (2017). Self-Normalizing Neural Networks. Advances in Neural Information Processing Systems 31.

[report-10] Hochreiter, S. (1991). Implementierung und Anwendung eines neuronalen Echtzeit-Lernalgorithmus für reaktive Umgebungen (PDF) (Report). Technical University Munich, Institute of Computer Science.

[ReferenceB-11] Arjona-Medina, J. A.; Gillhofer, M.; Widrich, M.; Unterthiner, T.; Hochreiter, S. (2018). "RUDDER: Return Decomposition for Delayed Rewards". arXiv:1806.07857 [cs.LG].

[NFS2006-12] Hochreiter, S.; Obermayer, K. (2006). "Nonlinear Feature Selection with the Potential Support Vector Machine". Feature Extraction, Studies in Fuzziness and Soft Computing: 419–438. doi:10.1007/978-3-540-35488-8_20. ISBN 978-3-540-35487-1.

[FARMS2006-13] Hochreiter, S.; Clevert, D.-A.; Obermayer, K. (2006). "A new summarization method for affymetrix probe level data". Bioinformatics. 22 (8): 943–949. doi:10.1093/bioinformatics/btl033. PMID 16473874.

[HochreiterWagner2007-14] Hochreiter, S.; Wagner, R. (2007). "Bioinformatics Research and Development". doi:10.1007/978-3-540-71233-6. ISSN 0302-9743.

[15] Hochreiter, S.; Heusel, M.; Obermayer, K. (2007). "Fast model-based protein homology detection without alignment". Bioinformatics. 23 (14): 1728–1736. doi:10.1093/bioinformatics/btm247. PMID 17488755.

[16] Hochreiter, S.; Younger, A. S.; Conwell, P. R. (2001). "Learning to Learn Using Gradient Descent" (PDF). Lecture Notes in Computer Science - ICANN 2001: 87–94. doi:10.1007/3-540-44668-0_13. ISSN 0302-9743.

[GoogleVoiceTranscription-17] "The neural networks behind Google Voice transcription".

[GoogleVoiceSearch-18] "Google voice search: faster and more accurate".

[GoogleAllo-19] "Chat Smarter with Allo".

[AppleQuicktype-20] "Apple's Machines Can Learn Too". The Information.

[AppleQuicktype2-21] Ranger, Steve. "iPhone, AI and big data: Here's how Apple plans to protect your privacy - ZDNet".

[HochreiterSchmidhuber1999-22] Hochreiter, S.; Schmidhuber, J. (1999). "Feature Extraction Through LOCOCODE". Neural Computation. 11 (3): 679–714. doi:10.1162/089976699300016629. ISSN 0899-7667.

[23] Hochreiter, S.; Schmidhuber, J. (1999). Source Separation as a By-product of Regularization. Advances in Neural Information Processing Systems 12. pp. 459–465.

[24] Clevert, D.-A.; Unterthiner, T.; Hochreiter, S. (2016). "Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) [published as a conference paper at ICLR 2016]". arXiv:1511.07289v5 [cs.LG].

[25] Heusel, M.; Ramsauer, H.; Unterthiner, T.; Nessler, B.; Klambauer, G.; Hochreiter, S. (2017). "GANs Trained by a Two Time-Scale Update Rule Converge to a local Nash Equilibrium". arXiv:1706.08500 [cs.LG].

[26] Heusel, M.; Ramsauer, H.; Unterthiner, T.; Nessler, B.; Klambauer, G.; Hochreiter, S. (2017). GANs Trained by a Two Time-Scale Update Rule Converge to a local Nash Equilibrium. Advances in Neural Information Processing Systems 31.

[27] Clevert, D.-A.; Mayr, A.; Unterthiner, T.; Hochreiter, S. (2015). "Rectified Factor Networks". arXiv:1502.06464v2 [cs.LG].

[28] Clevert, D.-A.; Mayr, A.; Unterthiner, T.; Hochreiter, S. (2015). Rectified Factor Networks. Advances in Neural Information Processing Systems 29.

[29] Clevert, D.-A.; Unterthiner, T.; Povysil, G.; Hochreiter, S. (2017). "Rectified factor networks for biclustering of omics data". Bioinformatics. 33 (14): i59–i66. doi:10.1093/bioinformatics/btx226.

[techreport-30] Schmidhuber, J. (1990). Making the world differentiable: On Using Fully Recurrent Self-Supervised Neural Networks for Dynamic Reinforcement Learning and Planning in Non-Stationary Environments (PDF) (Technical report). Technical University Munich, Institute of Computer Science. FKI-126-90 (revised).

[31] Storck, J.; Hochreiter, S.; Schmidhuber, J. (1995). Reinforcement driven information acquisition in non-deterministic environments (PDF). International Conference on Artificial Neural Networks. pp. 159–164.

[VerbistKlambauer2015-32] Verbist, B.; Klambauer, G.; Vervoort, L.; Talloen, W.; Shkedy, Z.; Thas, O.; Bender, A.; Göhlmann, H.W.H.; Hochreiter, S. (2015). "Using transcriptomics to guide lead optimization in drug discovery projects: Lessons learned from the QSTAR project". Drug Discovery Today. 20 (5): 505–513. doi:10.1016/j.drudis.2014.12.014. ISSN 1359-6446. PMID 25582842.

[EduatiMangravite2015-33] Eduati, F.; Mangravite, L. M.; Wang, T.; ...; Hochreiter, S.; ...; Stolovitzky, G.; Xie, Y.; Saez-Rodriguez, J. (2015). "Prediction of human population responses to toxic compounds by a collaborative competition". Nature Biotechnology. 33 (9): 933–940. doi:10.1038/nbt.3299. ISSN 1087-0156.

[TOX21-34] ""Toxicology in the 21st century Data Challenge"".

[35] Mayr, A.; Klambauer, G.; Unterthiner, T.; Hochreiter, S. (2016). "DeepTox: Toxicity Prediction using Deep Learning". Frontiers in Environmental Science. 3 (80). doi:10.3389/fenvs.2015.00080.

[Unterthiner2014-36] Unterthiner, T.; Mayr, A.; Klambauer, G.; Steijaert, M.; Ceulemans, H.; Wegner, J. K.; & Hochreiter, S. (2014) "Deep Learning as an Opportunity in Virtual Screening". Workshop on Deep Learning and Representation Learning (NIPS2014).

[Unterthiner2015-37] Unterthiner, T.; Mayr, A.; Klambauer, G.; & Hochreiter, S. (2015) "Toxicity Prediction using Deep Learning". ArXiv, 2015.

[38] Preuer, K.; Lewis, R. P. I.; Hochreiter, S.; Bender, A.; Bulusu, K. C.; Klambauer, G. (2017). "DeepSynergy: predicting anti-cancer drug synergy with Deep Learning". Bioinformatics. doi:10.1093/bioinformatics/btx806.

[39] Hochreiter, S.; Bodenhofer, U.; Heusel, M.; Mayr, A.; Mitterecker, A.; Kasim, A.; Khamiakova, T.; Van Sanden, S.; Lin, D.; Talloen, W.; Bijnens, L.; Göhlmann, H. W. H.; Shkedy, Z.; Clevert, D.-A. (2010). "FABIA: Factor analysis for bicluster acquisition". Bioinformatics. 26 (12): 1520–1527. doi:10.1093/bioinformatics/btq227. PMC 2881408. PMID 20418340.

[40] Kasim, A.; Shkedy, Z.; Kaiser, S.; Hochreiter, S.; Talloen, W. (2016). Applied Biclustering Methods for Big and High-Dimensional Data Using R. Chapman & Hall/CRC Biostatistics Series. New York: Taylor & Francis Group, Chapman & Hall. ISBN 9781482208238.

[41] Hochreiter, S.; Obermayer, K. (2006). "Support Vector Machines for Dyadic Data". Neural Computation. 18 (6): 1472–1510. doi:10.1162/neco.2006.18.6.1472. PMID 16764511.

[42] Knebel, T.; Hochreiter, S.; Obermayer, K. (2008). "An SMO Algorithm for the Potential Support Vector Machine". Neural Computation. 20 (1): 271–287. doi:10.1162/neco.2008.20.1.271. PMID 18045009.

[43] Hochreiter, S.; Obermayer, K. (2003). "Classification and Feature Selection on Matrix Data with Application to Gene-Expression Analysis". 54th Session of the International Statistical Institute. Archived from the original on 2012-03-25.

[44] Hochreiter, S.; Obermayer, K. (2004). "Gene Selection for Microarray Data". Kernel Methods in Computational Biology. MIT Press: 319–355. Archived from the original on 2012-03-25.

[45] Mahrenholz, C. C.; Abfalter, I. G.; Bodenhofer, U.; Volkmer, R.; Hochreiter, S. (2011). "Complex Networks Govern Coiled-Coil Oligomerization - Predicting and Profiling by Means of a Machine Learning Approach". Molecular & Cellular Proteomics. 10 (5): M110.004994–M110.004994. doi:10.1074/mcp.M110.004994. PMC 3098589. PMID 21311038.

[46] Hochreiter, S. (2013). "HapFABIA: Identification of very short segments of identity by descent characterized by rare variants in large sequencing data". Nucleic Acids Research. 41 (22): e202. doi:10.1093/nar/gkt1013. PMC 3905877. PMID 24174545.

[47] Povysil, G.; Hochreiter, S. (2014). "Sharing of Very Short IBD Segments between Humans, Neandertals, and Denisovans". bioRxiv 003988.

[48] "A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the Sequencing Quality Control Consortium". Nature Biotechnology. 32 (9): 903–914. September 2014. doi:10.1038/nbt.2957. PMC 4321899. PMID 25150838.

[49] S. A. Munro, S. P. Lund, P. S. Pine, H. Binder, D.-A. Clevert, A. Conesa, J. Dopazo, M. Fasold, S. Hochreiter, H. Hong, N. Jafari, D. P. Kreil, P. P. Labaj, S. Li, Y. Liao, S. M. Lin, J. Meehan, C. E. Mason, J. Santoyo-Lopez, R. A. Setterquist, L. Shi, W. Shi, G. K. Smyth, N. Stralis-Pavese, Z. Su, W. Tong, C. Wang, J. Wang, J. Xu, Z. Ye, Y. Yang, Y. Yu & M. Salit (2014). "Assessing technical performance in differential gene expression experiments with external spike-in RNA control ratio mixtures". Nature Communications. 5: 5125. arXiv:1406.4893. Bibcode:2014NatCo...5E5125M. doi:10.1038/ncomms6125. PMID 25254650.CS1 maint: Multiple names: authors list (link)

[50] Klambauer, G.; Schwarzbauer, K.; Mayr, A.; Clevert, D.-A.; Mitterecker, A.; Bodenhofer, U.; Hochreiter, S. (2012). "Cn.MOPS: Mixture of Poissons for discovering copy number variations in next-generation sequencing data with a low false discovery rate". Nucleic Acids Research. 40 (9): e69. doi:10.1093/nar/gks003. PMC 3351174. PMID 22302147.

[51] Klambauer, G.; Unterthiner, T.; Hochreiter, S. (2013). "DEXUS: Identifying differential expression in RNA-Seq studies with unknown conditions". Nucleic Acids Research. 41 (21): e198. doi:10.1093/nar/gkt834. PMC 3834838. PMID 24049071.

[52] Schwarzbauer, K.; Bodenhofer, U.; Hochreiter, S. (2012). Campbell, Moray, ed. "Genome-wide chromatin remodeling identified at GC-rich long nucleosome-free regions". PLOS ONE. 7 (11): e47924. Bibcode:2012PLoSO...747924S. doi:10.1371/journal.pone.0047924. PMC 3489898. PMID 23144837.

[53] Povysil, G.; Tzika, A.; Vogt, J.; Haunschmid, V.; Haunschmid, L.; Zschocke, J.; Klambauer, G.; Hochreiter, S.; Wimmer, K. "panelcn.MOPS: Copy number detection in targeted NGS panel data for clinical diagnostics". Human Mutation. 38 (7): 889–897. doi:10.1002/humu.23237.

[54] Talloen, W.; Clevert, D.-A.; Hochreiter, S.; Amaratunga, D.; Bijnens, L.; Kass, S.; Gohlmann, H. W. H. (2007). "I/NI-calls for the exclusion of non-informative genes: A highly effective filtering tool for microarray data". Bioinformatics. 23 (21): 2897–2902. doi:10.1093/bioinformatics/btm478. PMID 17921172.

[55] Talloen, W.; Hochreiter, S.; Bijnens, L.; Kasim, A.; Shkedy, Z.; Amaratunga, D.; Gohlmann, H. (2010). "Filtering data from high-throughput experiments based on measurement reliability". Proceedings of the National Academy of Sciences. 107 (46): E173–E174. Bibcode:2010PNAS..107E.173T. doi:10.1073/pnas.1010604107. PMC 2993399. PMID 21059952.

[56] Kasim, A.; Lin, D.; Van Sanden, S.; Clevert, D.-A.; Bijnens, L.; Göhlmann, H.; Amaratunga, D.; Hochreiter, S.; Shkedy, Z.; Talloen, W. (2010). "Informative or Noninformative Calls for Gene Expression: A Latent Variable Approach". Statistical Applications in Genetics and Molecular Biology. 9. doi:10.2202/1544-6115.1460.

[57] Clevert, D.-A.; Mitterecker, A.; Mayr, A.; Klambauer, G.; Tuefferd, M.; De Bondt, A. D.; Talloen, W.; Göhlmann, H.; Hochreiter, S. (2011). "Cn.FARMS: A latent variable model to detect copy number variations in microarray data with a low false discovery rate". Nucleic Acids Research. 39 (12): e79. doi:10.1093/nar/gkr197. PMC 3130288. PMID 21486749.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

[26]

[27]

[28]

[29]

[30]

[31]

[32]

[33]

[34]

[35]

[36]

[37]

[38]

[39]

[40]

[41]

[42]

[43]

[44]

[45]

[46]

[47]

[48]

[49]

[50]

[51]

[52]

[53]

[54]

[55]

[56]

[57]

Sepp Hochreiter

Contents

Scientific contributions[edit]

Long short-term memory (LSTM)[edit]

Deep learning and learning representations[edit]

Reinforcement Learning[edit]

Drug discovery, target prediction, and toxicology[edit]

Biclustering[edit]

Support vector machines[edit]

Feature selection[edit]

Genetics[edit]

Next-generation sequencing[edit]

Microarray preprocessing and summarization[edit]

References[edit]

Sources[edit]

External links[edit]

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Interaction

Tools

Print/export

In other projects

Languages

Sepp Hochreiter

Born	(1967-02-14) February 14, 1967 (age 51) Mühldorf, West Germany
Residence	Austria
Nationality	German
Alma mater	Technische Universität München
Scientific career
Fields	Machine learning, bioinformatics
Institutions	Johannes Kepler University Linz

Website	homepage