Christoph Dieterich
Max Planck Institute for the Biology of Ageing, Cologne
"Modeling Ribosome Profiling Data with Bayesian Hidden Markov Models"
Abstract
Ribosome profiling via high-throughput sequencing (ribo-seq) is a promising new technique for characterizing the occupancy of ribosomes on messenger RNA (mRNA) at base-pair resolution. The ribosome is responsible for translating mRNA into proteins, so information about its occupancy offers a detailed view of ribosome density and position which could be used to discover new upstream open reading frames, alternative start codons and new isoforms. Furthermore, this data allows the study of translational dynamics, such as decoding speed and ribosome pausing. Despite the wealth of information offered by ribo-seq, current analysis techniques have focused on coarse, gene-level statistics. In this work, we propose a hidden Markov model (HMM) approach to predict, at base-pair resolution, ribosome occupancy and translation. We use state-of-the-art learning algorithms to fit the parameters of our model, which correspond to biologically meaningful quantities, such as expected ribosome occupancy. Furthermore, we extend the model with Bayesian hyperparameters to quantify the uncertainty of the learned parameters. Preliminary evaluation shows that the HMM achieves a much higher true positive rate, and overall higher AUC, in identifying proteomics-verified coding regions compared to using the raw profile.