Many problems in computer vision involve conditional image generation, acquisition and display.. Deep learning has greatly advanced the quality of the generated images in recent years; however challenges still remain. These problems often come with paired training data, making the image generation supervised.
We develop innovative [image restoration techniques](https://arxiv.org/pdf/1709.03749) using deep neural networks. These techniques can solve many problems such as image denoising, demosaicing, super-resolution, and inpainting. We also work on [video frame interpolation](https://people.compute.dtu.dk/jerf/papers/vfi_cft_arf.pdf) which is able to increase the temporal resolution of a video.
We develop innovative image restoration techniques using deep neural networks. These techniques can solve many problems such as image denoising, demosaicing, super-resolution, and inpainting. We also work on video frame interpolation which is able to increase the temporal resolution of a video.
For further information see:
-[Arjomand Bigdeli, Siavash, Matthias Zwicker, Paolo Favaro, and Meiguang Jin. "Deep mean-shift priors for image restoration." Advances in Neural Information Processing Systems 30 (2017).](https://arxiv.org/pdf/1709.03749)
-[Hannemose, Morten, Janus Nørtoft Jensen, Gudmundur Einarsson, Jakob Wilm, Anders Bjorholm Dahl, and Jeppe Revall Frisvad. "Video frame interpolation via cyclic fine-tuning and asymmetric reverse flow." In Image Analysis: 21st Scandinavian Conference, SCIA 2019, Norrköping, Sweden, June 11–13, 2019, Proceedings 21, pp. 311-323. Springer International Publishing, 2019.](https://people.compute.dtu.dk/jerf/papers/vfi_cft_arf.pdf)
We develop [deep learning models for data density estimation](https://arxiv.org/pdf/2001.02728) with applications in out-of-distribution detection and compression.
We also build generative models for various visual data such as [face images](https://arxiv.org/pdf/1804.08972.pdf) and [3D textures](https://arxiv.org/pdf/2006.16112.pdf).
We develop deep learning models for data density estimation with applications in out-of-distribution detection and compression.
We also build generative models for various visual data such as face images and 3D textures.
For further information see:
-[Bigdeli, Siavash A., Geng Lin, Tiziano Portenier, L. Andrea Dunbar, and Matthias Zwicker. "Learning generative models using denoising density estimators." arXiv preprint arXiv:2001.02728 (2020).](https://arxiv.org/pdf/2001.02728)
-[Portenier, Tiziano, Qiyang Hu, Attila Szabo, Siavash Arjomand, Paolo Favaro, and Matthias Zwicker. "FaceShop: Deep Sketch-based Image Editing." ACM transactions on graphics 37, no. 4 (2018): 1-13.](https://arxiv.org/pdf/1804.08972.pdf)
-[Portenier, Tiziano, Siavash Arjomand Bigdeli, and Orcun Goksel. "Gramgan: Deep 3d texture synthesis from 2d exemplars." Advances in Neural Information Processing Systems 33 (2020): 6994-7004.](https://arxiv.org/pdf/2006.16112.pdf)
We are investigating [deep learning-based AI techniques](https://arxiv.org/pdf/2001.02728) to address data compression challenges.
We are investigating deep learning-based AI techniques to address data compression challenges.
We are working with high dimensional datasets as well as 2D natural images.
We also look at [innovative neural network approaches](https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=9746375) to reduce their size with respect to their performance.
We also look at innovative neural network approaches to reduce their size with respect to their performance.
For further information see:
-[Bigdeli, Siavash A., Geng Lin, Tiziano Portenier, L. Andrea Dunbar, and Matthias Zwicker. "Learning generative models using denoising density estimators." arXiv preprint arXiv:2001.02728 (2020).](https://arxiv.org/pdf/2001.02728)
-[Narduzzi, Simon, Siavash A. Bigdeli, Shih-Chii Liu, and L. Andrea Dunbar. "Optimizing the consumption of spiking neural networks with activity regularization." In ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 61-65. IEEE, 2022.](https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=9746375)
A fundamental component of most visual recognition systems is the learning procedure. Like humans, the machine needs to be trained beforehand with a lot of training examples to be able to learn visual models and make predictions in unseen data. Obtaining such data typically requires human annotation which is tedious, very expensive and time consuming. To alleviate this problem, we focus on developing efficient techniques for learning visual models with minimal supervision for the tasks of object detection and image segmentation. We have proposed weakly supervised models with efficient human interaction ([eye-tracking](https://calvin-vision.net/datasets/poet-dataset/), [center clicking](https://calvin-vision.net/datasets/center-click-annotations/), [extreme clicking](https://openaccess.thecvf.com/content_ICCV_2017/papers/Papadopoulos_Extreme_Clicking_for_ICCV_2017_paper.pdf)) , human-in-the-loop schemes ([human verification](https://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/Papadopoulos_We_Dont_Need_CVPR_2016_paper.pdf) and label propagation techniques ([scaling anno](http://scaling-anno.csail.mit.edu/)) for efficient image annotation.
A fundamental component of most visual recognition systems is the learning procedure. Like humans, the machine needs to be trained beforehand with a lot of training examples to be able to learn visual models and make predictions in unseen data. Obtaining such data typically requires human annotation which is tedious, very expensive and time consuming. To alleviate this problem, we focus on developing efficient techniques for learning visual models with minimal supervision for the tasks of object detection and image segmentation. We have proposed weakly supervised models with efficient human interaction, human-in-the-loop schemes and label propagation techniques for efficient image annotation.
For further information see:
-[Papadopoulos, Dim P., Alasdair DF Clarke, Frank Keller, and Vittorio Ferrari. "Training object class detectors from eye tracking data." In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13, pp. 361-376. Springer International Publishing, 2014.](https://calvin-vision.net/datasets/poet-dataset/)
-[Papadopoulos, Dim P., Jasper RR Uijlings, Frank Keller, and Vittorio Ferrari. "Training object class detectors with click supervision." In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6374-6383. 2017.](https://calvin-vision.net/datasets/center-click-annotations/)
-[Papadopoulos, Dim P., Jasper RR Uijlings, Frank Keller, and Vittorio Ferrari. "Extreme clicking for efficient object annotation." In Proceedings of the IEEE international conference on computer vision, pp. 4930-4939. 2017.](https://openaccess.thecvf.com/content_ICCV_2017/papers/Papadopoulos_Extreme_Clicking_for_ICCV_2017_paper.pdf)
-[Papadopoulos, Dim P., Jasper RR Uijlings, Frank Keller, and Vittorio Ferrari. "We don't need no bounding-boxes: Training object class detectors using only human verification." In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 854-863. 2016.](https://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/Papadopoulos_We_Dont_Need_CVPR_2016_paper.pdf)
-[Papadopoulos, Dim P., Ethan Weber, and Antonio Torralba. "Scaling up instance annotation via label propagation." In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 15364-15373. 2021.](http://scaling-anno.csail.mit.edu/)

## Multimodal learning
Humans and even animals use inputs from different senses for recognizing or understanding something and take decisions accordingly. Similarly, machines in real-world applications often use multiple sensors that produce multi-modal data. We have developed multimodal systems that combine vision and language for several domains such as artificial intelligence for social good ([incidents1M](http://incidentsdataset.csail.mit.edu/)) or food recognition ([cooking programs](http://cookingprograms.csail.mit.edu/)), [pizzaGAN](http://pizzagan.csail.mit.edu/)).
Humans and even animals use inputs from different senses for recognizing or understanding something and take decisions accordingly. Similarly, machines in real-world applications often use multiple sensors that produce multi-modal data. We have developed multimodal systems that combine vision and language for several domains such as artificial intelligence for social good or food recognition.
For further information see:
-[Weber, Ethan, Nuria Marzo, Dim P. Papadopoulos, Aritro Biswas, Agata Lapedriza, Ferda Ofli, Muhammad Imran, and Antonio Torralba. "Detecting natural disasters, damage, and incidents in the wild." In European conference on computer vision, pp. 331-350. Springer, Cham, 2020.](http://incidentsdataset.csail.mit.edu/)
-[Papadopoulos, Dim P., Enrique Mora, Nadiia Chepurko, Kuan Wei Huang, Ferda Ofli, and Antonio Torralba. "Learning Program Representations for Food Images and Cooking Recipes." In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16559-16569. 2022.](http://cookingprograms.csail.mit.edu/)
-[Papadopoulos, Dim P., Youssef Tamaazousti, Ferda Ofli, Ingmar Weber, and Antonio Torralba. "How to make a pizza: Learning a compositional layer-based gan model." In proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8002-8011. 2019.](http://pizzagan.csail.mit.edu/)
Having an accurately calibrated camera setup (intrinsics and extrinsics) is essential for many computer vision problems such as 3D scanning. We develop practical methods for [highly accurate camera calibration](https://doi.org/10.1117/12.2531769).
Having an accurately calibrated camera setup (intrinsics and extrinsics) is essential for many computer vision problems such as 3D scanning. We develop practical methods for highly accurate camera calibration
For further information see:
-[Hannemose, Morten, Jakob Wilm, and Jeppe Revall Frisvad. "Superaccurate camera calibration via inverse rendering." Modeling Aspects in Optical Metrology VII. Vol. 11057. SPIE, 2019.](https://doi.org/10.1117/12.2531769).