Artificial Intelligence
~ 8 minute read
13 Jan 2023
There has been a lot of talk around AI and ChatGPT last year, as 2022 was the year of some of the greatest AI achievements so far. From the famous case of the Google Engineer claiming its Chatbot is sentient, to the launch of ChatGPT by OpenAI - 2022 has been an interesting year, to say the least.
It's time for a wrap-up, and that is why I have prepared a list with some of the most impactful AI papers published in 2022. They were a great source of inspiration for me, as an AI & Machine Learning Engineer, so I'm sure they'll be an interesting read for anyone who wants to keep up to date with this subject:
1. ChatGPT: Optimizing Language Models for Dialogue
Probably you've already seen several examples of dialogs with ChatGPT and you got an idea on how knowledgeable and fluent ChatGPT is. Besides one-on-one interaction, ChatGPT can also write original stories from scratch, or even write/debug code. Several educational institutions already banned its access from their institutions in order to prevent the "death" of the college essays.
References
OpenAI, 2022 - ChatGPT: Optimizing Language Models for Dialogue
Tony Wan - GPT and a New Generation of AI for Education
2. An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion
Text-to-Image models like DALLE or stable diffusion are already very powerful and innovative to generate original images from a few words prompt line. What is more in this work proposed by Tel Aviv University and NVIDIA researchers is adding new constraints through some input images of a desired object. The embeddings of this object will go along with the input text to transform that object into another style, such as turning your cat into Badman, or simply transposing it into another scene.
What is more in this work proposed by Tel Aviv University and NVIDIA researchers is adding new constraints through some input images of a desired object. The embeddings of this object will go along with the input text to transform that object into another style, such as turning your cat into Badman, or simply transposing it into another scene.
References
Gal, R., Alaluf, Y., Atzmon, Y., Patashnik, O., Bermano, A.H., Chechik, G. and Cohen-Or, D., 2022 - An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion
3. Make-a-video: Text-to-video generation without text-video data
Making use of the progress in the case of text-to-image, Meta AI released a new model for text-to video that is able to generate coherent videos from a text prompt. Make-A-Video sets the new state-of-the-art in text-to-video generation, as determined by both qualitative and quantitative measures.
References
Singer et al., 2022 - MAKE-A-VIDEO: TEXT-TO-VIDEO GENERATION WITHOUT TEXT-VIDEO DATA
4. Galactica: A Large Language Model for Science
Today scientific knowledge is accessed through search engines, but Galactica goes a step further towards web decentralization by enabling it to organize the information by scientific knowledge only. Galactica is a large language model similar to GPT-3, but trained on scientific papers. In addition to information retrieval, Galactica can write whitepapers, reviews, Wikipedia pages, code and also knows how to cite or write equations.
References
Taylor et al., 2022 - Galactica: A Large Language Model for Science
5. NeROIC: Neural Rendering of Objects from Online Image Collections
Snapchat and the University of Southern California present a method to reconstruct the 3D representations of an object from an online image collection. What's great about this model is its ability to capture high-quality geometry and material properties of arbitrary objects from photographs acquired with any type of camera.
References
Kuang, Z., Olszewski, K., Chai, M., Huang, Z., Achlioptas, P. and Tulyakov, S., 2022 - NeROIC: Neural Rendering of Objects from Online Image Collections
6. Robust Speech Recognition via Large-Scale Weak Supervision
OpenAI released Whisper which is a multilingual and multitask open-sourced AI model transcription and translations. It can accurately understand what you say in more than 96 languages, and write it down for you.
References
Radford, A., Kim, J.W., Xu, T., Brockman, G., McLeavey, C. and Sutskever, I. - Robust Speech Recognition via Large-Scale Weak Supervision
7. SpeechPainter: Text-conditioned Speech Inpainting
Google researchers have proposed SpeechPainter, a model for filling gaps of up to one second in speech samples using a textual input assistant. The cool part about this model is that it is capable of maintaining speaker identity, the recording environment conditions, and it can generalize to unseen speakers.
Similar to image inpainting, where the aim is to remove unwanted objects and reconstruct the background, in the case of speech impainting, we take the spectral representation of the audio signal and use the same approach to train the model. Impressive results were obtained this time as well.
References
Borsos, Z., Sharifi, M. and Tagliasacchi, M., 2022 - SpeechPainter: Text-conditioned Speech Inpainting
8. Real-time Neural Radiance Talking Portrait Synthesis via Audio-spatial Decomposition
The authors present a solution for real time synthesizing the person talking based on neural dynamic radiation fields (NeRF). The solution requires a single video with the talking person as input and it works for any audio track. While NeRFs have demonstrated success in high-fidelity 3D modeling of talking portraits, the authors propose a module for audio-spatial decomposition into low dimensional feature grids that successfully achieves real time conditions while preserving a high quality for the resulting video.
References
Jiaxiang Tang, Kaisiyuan Wang, Hang Zhou, Xiaokang Chen, Dongliang He, Tianshu Hu, Jingtuo Liu, Gang Zeng, Jingdong Wang - Real-time Neural Radiance Talking Portrait Synthesis via Audio-spatial Decomposition
🙂 Hope you enjoyed the reading, and feel free to share with us what were your favorite AI findings in 2022 and what are your thoughts for the future of AI.
Written by
Head of AI
Share this article on