Now I am become Doctor

10 Jan 2025

This is just a quick post to announce that I’ve successfully defended my Ph.D. thesis this past December 5th, titled Guiding AI Attention for Driving and Creative Generation. The final grade I obtained was of Excel·lent (A), which allowed me to also obtain a Cum Laude.

This was a long road, but I’m glad that I took it. I’ll be forever grateful to my advisor Antonio López, especially for seeing the value in adding the creative usage of Neural Networks to different fields in Machine Learning. Likewise, a special thanks to the thesis examination panel Prof. Jose María Armingol, José Manuel Álvarez, and Fernando Vilariño. Their feedback and questions were both reaffirming and encouraging to the work done during my Ph.D., as well as the work still to be done in the fields of autonomous driving and creative applications of ML to the arts.

All in all, a Ph.D. was a much different path than the one I expected coming in, in particular regarding personal news: we had two kids with my wife Samantha, Santiago and Olivia, which on one hand made it harder to stick to paper deadlines, but on the other made all the effort far more rewarding. Indeed, a Ph.D. is a hard path for interpersonal relationships, but having my wife and kids kept me sane most of the time. As Joseph J. Rotman puts it in his dedication in An Introduction to Algebraic Topology (jokingly I hope):

To my wife Marganit and my children Ella Rose and Daniel Adam without whom this book would have been completed two years earlier

On the Thesis Manuscript

The main lesson I’ve learnt is that during your Ph.D., you will have many ideas that will lead you nowhere; others that cannot be pursued due to a lack of compute or knowledge or manpower/time. If you by some chance are thinking of doing a Ph.D., I must recommend you to both learn from and work with those surrounding you, as you can only get so far by yourself. Likewise, try to read more and do crazy things, as science is mostly asking and answering the how, with the why coming after discovering a phenomena and delving deep into it.

A key aspect here is to not only read scientific papers, which is a trap I see most students fall in: they only read the end product (the papers in prestigious conferences), without realizing that it’s the culmination of a lot of iteration, work, and storytelling that were mostly burrowed not just from other scientific work, but from life in general: literature, visual arts, music, architecture, etc. Storytelling is especially important, as your audience needs to be sold on your idea in an effective matter. No amount of mathematical jargon will cure a poorly-written paper, so read, read, and read!

You can find the recording of the thesis defense here. While the official link to access the PDF file of the thesis is this one, it is under an embargo until December 2026 due to unpublished work. Thus, I can only share the recording above and the accompanying slides here:

Guiding AI Attention for Driving and Creative Generation - Thesis Defense by Diego Porres

In any case, I quickly summarize the main chapters as follows, with Chapters 1 and 5 being the Introduction and Conclusions/Future Work, respectively. Note that each chapter is meant to be self-contained:

Future Work

Chapter 5 summarized the conclusions of the thesis, as well as proposing future work:

Artwork

Not surprisingly, I’ve managed to produce some artwork that has been accepted in various exhibitions and galleries. The main ones became the cover of each chapter, namely:

Hidden Clergy - Wikiart

Hidden Clergy - MetFaces

Next Steps

I’ll continue to be in the Computer Vision Center (CVC) where I’ll join the BERTHA project as a Postdoctoral Researcher. The main aim of this project is to develop a safer and more human-like autonomous vehicle. Specifically, we want to imbue certain characteristics that current models are missing, such as object permanence/memory, key object selection/attention, and data efficiency. I’ll be working on the last two parts in particular, as the Attention Loss we proposed in Chapter 3 of my thesis has these characteristics: by guiding where an end-to-end driving model should look at, you can also let it know what objects are important and use less data to train them than usual. We’re 14 research centers in total, so expect interesting results from this project in the coming months, both in the form of datasets and new models (and of course research papers).

In the meantime, I’ll be uploading here a lot of works, tricks, and small studies that I’ve learned throughout these past five years of my Ph.D. journey. Some will be generally useful, others more geared towards specific fields such as autonomous driving and creative practices regarding AI, the arts, and how we interact with these models.

Cheers!