Jose Costa Pereira


Hello, and welcome to my webpage!

I’m an Assistant Professor at the University of Porto, Portugal. You can find me at the School of Engineering (FEUP), office D106.
If you’ve landed here, chances are you’re interested in my research – below is a brief overview. Feel free to reach out!


 

Before joining the University of Porto, I spent six years as a Senior Research Scientist at Huawei Technologies in London, UK. At the Noah’s Ark Lab, part of Huawei’s R&D division, I focused heavily on computational photography, particularly Image-to-Image Restoration techniques and No-Reference Image Quality Assessment (NR-IQA) from a perceptual standpoint. While many solutions have been proposed to tackle these challenges – some with great success – creating a robust NR-IQA model that reliably mimics the human visual system (HVS) remains an open problem, especially for images in the wild. Image Quality Assessment (IQA) aims to bridge this gap, but we’re still far from a perfect solution.

During my time at INESCTEC, I collaborated with students and researchers on developing Computer-Aided Diagnosis/Detection (CADx/e) tools for breast cancer screening. The widespread use of digital imaging with annotations in medical diagnosis, combined with the success of deep learning in visual recognition, has driven the development of many practical medical imaging applications. Also, recent advancements in large language models (LLMs) have ushered in a new era of AI – one where we have yet to define the boundaries of what’s possible. Generative AI with the ubiquitous Transformer architecture and its attention mechanisms, and Stable Diffusion for generation of realistic images (and videos) are topics that I also follow closely. More broadly, I enjoy exploring ‘Intelligent Systems’. From automation of everyday tasks (e.g. copilot-style assistants) to expert-level knowledge (e.g. AI medical diagnosis); the rise of AI systems is transforming our daily lives in ways that seemed impossible just a few years ago.

Before the recent deep learning revolution, extracting meaningful, machine-friendly image representations was a challenging problem. Unlike text, which could be handled relatively well with a simpl bag-of-words, images lacked a similarly effective descriptor. It all changed in 2012 with Alex Krizhevsky et al.’s groundbreaking paper on “ImageNet Classification with Deep Convolutional Neural Networks”. It was a turning point for computer vision. Initially, I saw CNNs as powerful feature extractors but felt they lacked the semantic richness of text embeddings. Today, however, the search for unified multimodal representations – for text, images, audio, and video – is more active than ever. Finding accurate, descriptive, and robust representations is crucial for any intelligent system, making this an exciting area of research.

If any of these topics interests you, feel free to check out my Google Scholar profile for more details on my publications. And don’t hesitate to reach out — my email is below!