Back to articles

ARTICLE

must read AI articles from 2022

8 minutes read

Written by Anca Ciurte

Head of AI

Share this article on

context

There has been a lot of talk around AI and ChatGPT last year, as 2022 was the year of some of the greatest AI achievements so far. From the famous case of the Google Engineer claiming its Chatbot is sentient, to the launch of ChatGPT by OpenAI - 2022 has been an interesting year, to say the least.

It's time for a wrap-up, and that is why I have prepared a list with some of the most impactful AI papers published in 2022. They were a great source of inspiration for me, as an AI & Machine Learning Engineer, so I'm sure they'll be an interesting read for anyone who wants to keep up to date with this subject:

01

ChatGPT: optimizing language models for dialogue

Probably you've already seen several examples of dialogs with ChatGPT and you got an idea on how knowledgeable and fluent ChatGPT is. Besides one-on-one interaction, ChatGPT can also write original stories from scratch, or even write/debug code. Several educational institutions already banned its access from their institutions in order to prevent the "death" of the college essays.

References

OpenAI, 2022 - ChatGPT: Optimizing Language Models for Dialogue

Tony Wan - GPT and a New Generation of AI for Education

02

an image is worth one word: personalizing text-to-image generation using textual inversion

Text-to-Image models like DALLE or stable diffusion are already very powerful and innovative to generate original images from a few words prompt line. What is more in this work proposed by Tel Aviv University and NVIDIA researchers is adding new constraints through some input images of a desired object. The embeddings of this object will go along with the input text to transform that object into another style, such as turning your cat into Badman, or simply transposing it into another scene.

What is more in this work proposed by Tel Aviv University and NVIDIA researchers is adding new constraints through some input images of a desired object. The embeddings of this object will go along with the input text to transform that object into another style, such as turning your cat into Badman, or simply transposing it into another scene.

References

Gal, R., Alaluf, Y., Atzmon, Y., Patashnik, O., Bermano, A.H., Chechik, G. and Cohen-Or, D., 2022 - An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion

03

make-a-video: text-to-video generation without text-video data

Making use of the progress in the case of text-to-image, Meta AI released a new model for text-to video that is able to generate coherent videos from a text prompt. Make-A-Video sets the new state-of-the-art in text-to-video generation, as determined by both qualitative and quantitative measures.

References

Singer et al., 2022 - MAKE-A-VIDEO: TEXT-TO-VIDEO GENERATION WITHOUT TEXT-VIDEO DATA

04

galactica: a large language model for science

Today scientific knowledge is accessed through search engines, but Galactica goes a step further towards web decentralization by enabling it to organize the information by scientific knowledge only. Galactica is a large language model similar to GPT-3, but trained on scientific papers. In addition to information retrieval, Galactica can write whitepapers, reviews, Wikipedia pages, code and also knows how to cite or write equations.

References

Taylor et al., 2022 - Galactica: A Large Language Model for Science

05

NeROIC: neural rendering of objects from online image collections

Snapchat and the University of Southern California present a method to reconstruct the 3D representations of an object from an online image collection. What's great about this model is its ability to capture high-quality geometry and material properties of arbitrary objects from photographs acquired with any type of camera.

References

Kuang, Z., Olszewski, K., Chai, M., Huang, Z., Achlioptas, P. and Tulyakov, S., 2022 - NeROIC: Neural Rendering of Objects from Online Image Collections

06

robust speech recognition via large-scale weak supervision

OpenAI released Whisper which is a multilingual and multitask open-sourced AI model transcription and translations. It can accurately understand what you say in more than 96 languages, and write it down for you.

References

Radford, A., Kim, J.W., Xu, T., Brockman, G., McLeavey, C. and Sutskever, I. - Robust Speech Recognition via Large-Scale Weak Supervision

07

SpeechPainter: text-conditioned speech inpainting

Google researchers have proposed SpeechPainter, a model for filling gaps of up to one second in speech samples using a textual input assistant. The cool part about this model is that it is capable of maintaining speaker identity, the recording environment conditions, and it can generalize to unseen speakers.

Similar to image inpainting, where the aim is to remove unwanted objects and reconstruct the background, in the case of speech impainting, we take the spectral representation of the audio signal and use the same approach to train the model. Impressive results were obtained this time as well.

References

Borsos, Z., Sharifi, M. and Tagliasacchi, M., 2022 - SpeechPainter: Text-conditioned Speech Inpainting

08

real-time neural radiance talking portrait synthesis via audio-spatial decomposition

The authors present a solution for real time synthesizing the person talking based on neural dynamic radiation fields (NeRF). The solution requires a single video with the talking person as input and it works for any audio track. While NeRFs have demonstrated success in high-fidelity 3D modeling of talking portraits, the authors propose a module for audio-spatial decomposition into low dimensional feature grids that successfully achieves real time conditions while preserving a high quality for the resulting video.

References

Jiaxiang Tang, Kaisiyuan Wang, Hang Zhou, Xiaokang Chen, Dongliang He, Tianshu Hu, Jingtuo Liu, Gang Zeng, Jingdong Wang - Real-time Neural Radiance Talking Portrait Synthesis via Audio-spatial Decomposition

🙂 Hope you enjoyed the reading, and feel free to share with us what were your favorite AI findings in 2022 and what are your thoughts for the future of AI.

Share this article on

agentic SDLC

ai & ml

software development

ui/ux design

machine learning

generative ai

computer vision

time series analysis

case studies

articles

agentic SDLC

ai, machine learning

software development

ui/ux design

case studies

articles

ARTICLE

must read AI articles from 2022

Written by Anca Ciurte

context

01

ChatGPT: optimizing language models for dialogue

02

an image is worth one word: personalizing text-to-image generation using textual inversion

03

make-a-video: text-to-video generation without text-video data

04

galactica: a large language model for science

05

NeROIC: neural rendering of objects from online image collections

06

robust speech recognition via large-scale weak supervision

07

SpeechPainter: text-conditioned speech inpainting

08

real-time neural radiance talking portrait synthesis via audio-spatial decomposition