Anca Ciurte
Head of AI
There has been a lot of talk around AI and ChatGPT last year, as 2022 was the year of some of the greatest AI achievements so far. From the famous case of the Google Engineer claiming its Chatbot is sentient, to the launch of ChatGPT by OpenAI - 2022 has been an interesting year, to say the least.
It's time for a wrap-up, and that is why I have prepared a list with some of the most impactful AI papers published in 2022. They were a great source of inspiration for me, as an AI & Machine Learning Engineer, so I'm sure they'll be an interesting read for anyone who wants to keep up to date with this subject:
Probably you've already seen several examples of dialogs with ChatGPT and you got an idea on how knowledgeable and fluent ChatGPT is. Besides one-on-one interaction, ChatGPT can also write original stories from scratch, or even write/debug code. Several educational institutions already banned its access from their institutions in order to prevent the "death" of the college essays.
References:
Text-to-Image models like DALLE or stable diffusion are already very powerful and innovative to generate original images from a few words prompt line. What is more in this work proposed by Tel Aviv University and NVIDIA researchers is adding new constraints through some input images of a desired object. The embeddings of this object will go along with the input text to transform that object into another style, such as turning your cat into Badman, or simply transposing it into another scene.
What is more in this work proposed by Tel Aviv University and NVIDIA researchers is adding new constraints through some input images of a desired object. The embeddings of this object will go along with the input text to transform that object into another style, such as turning your cat into Badman, or simply transposing it into another scene.
References:
Making use of the progress in the case of text-to-image, Meta AI released a new model for text-to video that is able to generate coherent videos from a text prompt. Make-A-Video sets the new state-of-the-art in text-to-video generation, as determined by both qualitative and quantitative measures.
References:
Today scientific knowledge is accessed through search engines, but Galactica goes a step further towards web decentralization by enabling it to organize the information by scientific knowledge only. Galactica is a large language model similar to GPT-3, but trained on scientific papers. In addition to information retrieval, Galactica can write whitepapers, reviews, Wikipedia pages, code and also knows how to cite or write equations.
References:
Snapchat and the University of Southern California present a method to reconstruct the 3D representations of an object from an online image collection. What's great about this model is its ability to capture high-quality geometry and material properties of arbitrary objects from photographs acquired with any type of camera.
References:
OpenAI released Whisper which is a multilingual and multitask open-sourced AI model transcription and translations. It can accurately understand what you say in more than 96 languages, and write it down for you.
References:
Google researchers have proposed SpeechPainter, a model for filling gaps of up to one second in speech samples using a textual input assistant. The cool part about this model is that it is capable of maintaining speaker identity, the recording environment conditions, and it can generalize to unseen speakers.
Similar to image inpainting, where the aim is to remove unwanted objects and reconstruct the background, in the case of speech impainting, we take the spectral representation of the audio signal and use the same approach to train the model. Impressive results were obtained this time as well.
References:
The authors present a solution for real time synthesizing the person talking based on neural dynamic radiation fields (NeRF). The solution requires a single video with the talking person as input and it works for any audio track. While NeRFs have demonstrated success in high-fidelity 3D modeling of talking portraits, the authors propose a module for audio-spatial decomposition into low dimensional feature grids that successfully achieves real time conditions while preserving a high quality for the resulting video.
References:
🙂 Hope you enjoyed the reading, and feel free to share with us what were your favorite AI findings in 2022 and what are your thoughts for the future of AI.
Office
24 George Barițiu street, 400027, Cluj-Napoca, Cluj, Romania
39 Sfântul Andrei street, Palas Campus, B1 block, 5th floor, 700032, Iasi, Iasi, Romania
Careers
View openings