A Study Guide on Generative AI for PhD scholars

Study Guide on Generative AI

What is Generative AI? – explained fundamentally

Before jumping on to the explanation, let’s have a reflective understanding of Generative AI.

The name itself explains the very nature of generative AI, which creates something fresh out of learned patterns and behaviour. Much like a human brain would do in learning a new language or trying to draw something. GenAI develops cognitive thinking abilities similar to humans; they produce content that resembles human nature.

While they adapt to the thinking and creating capabilities of humans, it is necessary to feed them intuitive reasoning, but that is what separates machines from humans. The sixth sense is an exclusive attribute to humans alone.

Now, let us get into the technical aspects of generative AI and understand its fundamental characteristics.

What is Generative AI

Generative AI is a subset of artificial intelligence systems that creates real-world content, replicating its qualities based on the training data we provided. It can create a variety of content forms, such as text, image, video, and audio.

They are trained on real-world datasets by ML models to replicate the attributes of real-world content. These computing models find patterns in the training data through an iterative process of learning with a diverse collection of datasets. With these learned patterns and relations, they produce new and original content.

These are the very obvious things about generative AI that we all come across frequently. But what is it that critically matters about them? Let’s take a detailed look at how they work, how they help us, and their implications in a simplified way.

How Does Generative AI Work?

The mechanism behind generative AI consists of three powerful models, namely GANs, VAEs and Transformers. In simple terms, they are the execution factors behind generative AI.

1. GAN (Generative Adversarial Networks)

GANs comprise two models: a generator and a discriminator. These two models are rivals for each other as they both compete against each other.

Let’s comprehend them with a small example. For instance, if a generator comes up with a fake image representation that resembles a real image, the discriminator will try to distinguish the fake image from the real one.

Eventually, the generator will improve its ability to create realistic images. They both push each other to do better until we get a convincing output. That’s the concept of GAN.

2. VAE (Variational Autoencoders)

VAEs are generative models that channelise the process of generation by co-relating input data with latent space to initiate the generation process.

They have both encoders and decoders. Encoders compress the input data and send it to the latent space; decoders obtain the corresponding samples from the latent space. Together, they vary the existing data each time to provide a fresh output.

3. Transformers

Transformers are the models which dissect each part of the input to weigh the important parts of the input. They help to keep the significant features of the data intact.

Transformers are highly effective in contextual tasks like understanding prompts, summarisation, and translations.

Together, these three factors of generative AI coordinate to produce high-quality and accurate outputs that are unique and original. Collectively, each factor serves as a filter to churn out accurate content that matches real data in terms of quality.

Types of Generative AI Models

Fundamentally, GenAI models are classified into three types based on the type of data they deal with. They are

Contextual models
Visual models
Audio generative models

Contextual models	Everything produced in the form of text is contextual data. They range from stories, information, articles, codes, etc. They carry meaning and context in their data that can be interpreted. ChatGPT is a prime example of that.
Visual models	Contents that are visually represented come under visual or imagery generation. They are produced by visual models such as Stable diffusion, DALL-E, Llama, etc. They create imagery outputs like photographs, artworks, videos, illustrations, 3-D objects, etc.
Audio models	Similarly, models that produce acoustic outputs are audio-generative models. They create audio outputs like voice, music and sound effects. Some of the famous models include OpenAI’s Jukebox, MuseNet and Google’s WaveNet.

These three formats of data can also interact among them, resulting in the rise of multimodal AI systems. These initiatives can ultimately replicate human intelligence as they perceive, process, and deliver data in all mediums.

How Generative AI serves us

The entire process of automation can be shrunk to the size of our pocket today. There is no need to introduce AI where every one of us uses it so deliberately. Our daily tasks are more organised than ever before upon the arrival of several AI systems.

Also, in the field of creative professions, people have started to utilise generative AI. From making independent music videos to releasing E-books using AI, there are plenty of applications. But are they truly providing us with what we want? Well, it depends on how effectively we employ them.

Based on human inputs, they generate multiple forms of content that are widely used in everyday applications. Most commonly, we use AI-generated images, videos and texts. Apart from the usual, GenAI is used in niche areas like programming, data science, drug synthesis, healthcare innovations, etc.

Generative AI serves as a bridge that connects our ideas with execution, as they simplify the process very easily. Although they are widely regarded, they lack an essential thing to become stable, which is reliability.

What makes them unreliable?

“Artificial Intelligence is very dangerous”, said a very famous person; it remains a fact till now, and it continues to be. The reasoning patterns of AI are not transparent enough to understand their behaviour. Lack of transparency makes it complex to understand and interpret them. They can sometimes provide a biased output that may perpetuate racial, linguistic, religious, or ethnic prejudices.

For that, explainability and interpretability became prominent research topics to make AI reliable and accurate. Many researchers have started to research the topic of XAI (explainable AI), aiming to find how they decide and conclude on each phenomenon.

Ethical Considerations in Generative AI

There are a lot of setbacks in the ethical practices of generative AI. They pose a challenging task to eliminate possible violations. They contain several risks that are sensitive in nature, such as intellectual property concerns, privacy breaches, potential bias and misinformation.

Generative models work on the basis of learned patterns from training data. They can produce biased results if the trained data contains selection bias.

Also, they have the chance to disclose private information due to their unreliable nature. As I said earlier, the lack of reasoning and sensibility will make AI systems prone to continuous errors.

Hence, the legal and ethical regulations around generative AI must be active enough to avoid significant malfunctions.

The Future of Generative AI

Rather than using generative AI as a replacement, utilising them to enhance human creativity and processing will be the best utility of AI. They can be efficient but cannot be reliable as of now.

As technology exponentially increases, it is apparent that we will be having more sophisticated models. And it is crucial to address the pervasive consequences through complete research.

Needing to explore them is a sign of scholarly contribution. However, researching them requires hours of effort and expertise to face the hurdles. But tackling them with the right approach will lead to breakthroughs.