Postdoctoral Research Associate
HomeOffshore.org Project, University of Manchester
I am a Postdoctoral Researcher at Manchester University’s School of Computer Science. My research focus is on deriving knowledge from large data sets through the use of Machine Learning.
My knowledge extends to a large array of dimension reduction techniques and unsupervised and supervised learning algorithms as applied to finance, healthcare and engineering. I have published several academic papers, reviewed for high impact conferences and journals and coordinated teams of master students pursuing their degree in data analytics.
HomeOffshore.org Project, University of Manchester
MDSAS Project, University of Manchester
Data Engineering Course, University of Manchester
Ph.D. Machine Learning
University of Manchester, United Kingdom
Master of Science,
Johannes Kepler University, Austria
West University Timisoara, Romania
The cost decrease and ubiquity of sensors facilitates real-time collection of data from wind farms, data that can be characterized by large volume (many sensors), high velocity (high measurement rate), high variety (image, sensor time-series, text reports), and potentially plagued with issues of veracity (missing/out-of-range data).
Our aim is to develop state of the art data models and infrastructure for the future wind farms. Our next generation Condition Monitoring systems based on Machine Learning and robotics will learn to predict failures before they happen and will significantly reduce the cost of operations and maintenance.
This paper reviews the recent literature on machine learning (ML) models that have been used for condition monitoring in wind turbines (e.g. blade fault detection or generator temperature monitoring). We classify these models by typical ML steps, including data sources, feature selection and extraction, model selection (classification, regression), validation and decision-making. Our findings show that most models use SCADA or simulated data, with almost two-thirds of methods using classification and the rest relying on regression. Neural networks, support vector machines and decision trees are most commonly used. We conclude with a discussion of the main areas for future work in this domain.
Fuzzy C-means has been utilized successfully in a wide range of applications, extending the clustering capability of the K-means to datasets that are uncertain, vague and otherwise hard to cluster. This paper introduces the Fuzzy C-means++ algorithm which, by utilizing the seeding mechanism of the K-means++ algorithm, improves the effectiveness and speed of Fuzzy C-means. By careful seeding that disperses the initial cluster centers through the data space, the resulting Fuzzy C-means++ approach samples starting cluster representatives during the initialization phase. The cluster representatives are well spread in the input space, resulting in both faster convergence times and higher quality solutions. Implementations in R of standard Fuzzy C-means and Fuzzy C-means++ are evaluated on various data sets. We investigate the cluster quality and iteration count as we vary the spreading factor on a series of synthetic data sets. We run the algorithm on real world data sets and to account for the non-determinism inherent in these algorithms we record multiple runs while choosing different k parameter values. The results show that the proposed method gives significant improvement in convergence times (the number of iterations) of up to 40 (2.1 on average) times the standard on synthetic datasets and, in general, an associated lower cost function value and Xie–Beni value. A proof sketch of the logarithmically bounded expected cost function value is given.
Every company listed on the London Stock Exchange is classified into an industry sector based on its primary activity, however, it may be both more interesting and valuable to group similarly performing companies based on their historical stock price record over a long period of time. Using fuzzy clustering analysis with a correlation-based metric, we obtain a more insightful categorization of the companies into groups with fuzzy boundaries, giving arguably a more realistic and detailed view of their relationships. Once cluster analysis is performed, we analyze the behaviour of discovered groups in terms of the volatility of their returns using both standard deviation and exponentially weighted moving average. This approach has the potential to be of practical relevance in the context of diversified portfolio construction as it can detect fuzzy clusters of correlated stocks that have lower inter-cluster correlation, analyze their volatility and sample potentially less risky combination of assets.
The HOME Offshore project will develop an intelligent decision support process, where experts will co-operate with the condition monitoring algorithms to identify key signals and actionable patterns to detect developing failures across components and subcomponents. The process will integrate time-stamped event data collected by various components, and then select, pre-process and transform target signals, to be analysed to infer integrated data-driven models. Such models will aim to reduce the quantity of monitored and analysed data by identifying diagnostic patterns and signals that can indicate a developing fault with minimal data, triggering dynamic requests for extra data capture as/when needed (e.g. via additional robot-assisted inspections of a sub-system; or analysis of low-granularity sensing data). We will combine these models with knowledge-driven models (e.g. existing reliability models and multi-signal component/turbine granular approaches) to identify actionable decisions.
The project aims to optimize the management of patient therapy through modeling the pharmacokinetics of the medicine (concentration in blood as function of time). Home treatments, where users log their times, dosages and medicine taken through smartphones, laptops, etc. (think FitBit) are becoming very popular especially for chronic diseases such as Hemophilia. We collaborate with Haemtrack (a highly successful online national system for Hemophilia patients) which provides us with a comprehensive dataset of patient background information, medicine logs, adverse effects.A parametrized model was implemented which takes as input a discrete set of temporal treatments from a database and produces a time series of pharmacokinetic curves representing drug concentration in the body. These pharmacokinetic curves allow modelling based on patient specific parameters such as half-life of a medicine, volume of distribution (e.g. plasma volume), etc.
I would be happy to talk to you about projects involving data mining.
You can find me at office 2.58 located in the Kilburn Building at the University of Manchester.
I am at my office every day from 9:00 am until 17:00 pm, but you may consider a call to fix an appointment.