I am currently a postgraduate research student in the School of Computer Science at the University of Manchester, UK, where I am studying for a Ph.D. under the supervision of Mr Graham Riley and Dr Antoniu Pop, both lecturers affiliated with the School's Advanced Processor Technologies Research Group. My research is focused on computer architecture design, parallel algorithms and scheduling on heterogeneous multicore architectures.
I also hold a master's degree in Physics Engineering, awarded in 2016 by the University of Coimbra, Portugal. This included a one-year research period in the context of metabolic imaging techniques, doing ab initio simulations within a Density-Functional Theory framework of the fluorescence properities of coenzymes NADH and FAD.
Heterogeneous multi-core processors are becoming increasingly more common. As a matter of fact, you are, in all likelihood, carrying one in your pocket right now. Together with the raising interest in hardware accelerator-based computing, e.g. using graphics cards, they are rapidly increasing the variability of the available resources inside computing systems. This, of course, raises the question: what parts of our applications should we run on each resource for optimal performance/energy efficiency? Those
parts are usually called tasks and are generally seen as the vertices of a graph whose the edges represent intertask dependences. The problem can then be thought as an (NP-hard) optimisation problem for the mapping of a directed acyclic graph to the available resources, whilst ensuring that the precedence requirements for these tasks are satisfied. The issue is that the programming models that try to tackle this problem, e.g. OpenStream and OmpSs, usually require users to explicitly delimit tasks and specify their dependences, thereby hindering the user's coding process. The aim of my research is to develop techniques that aspire to simplify the usage of such programming models.
Tasterproject (February 2017 - April 2017)
In the first few (research-driven) months in Manchester, I undertook a short research
taster project where we delve into the hardware architecture of Qualcomm's Snapdragon 820 processor. Some experimental techniques were developed to disclose the number of cache levels and their sizes on the Snapdragon's CPU that should (hopefully) be extensible to other CPUs. In addition, some performance measurements gathered using the YOLO object detection system seem to support the usage of clang in favour of gcc when targeting the CPU.