competition-areas

The areas for the 2025 GenAI competition consists of the following two major categories:

Category A: Types of Generative AI models

Generative AI or foundation models are designed to generate different types of content, such as text and chat, images, code, video, and embeddings. Researchers can modify these models to fit specific domains and tackle tasks by adjusting the generative AI’s learning algorithms or model structures. Category A Types are not limited to the following 8 models listed below, but include other variant model types.

Task-specific GAN Model

1. Generative Adversarial Networks (GANs)
GAN, stands for Generative Adversarial Network, is an advanced deep learning architecture consists of two essential components: a generator and a discriminator. The generator’s primary function is to generate synthetic data that closely resembles real data, while the discriminator is responsible for distinguishing between authentic and fabricated data. The generator enhances the authenticity of its produced data through adversarial training, while the discriminator effectively determines whether the data is real or synthetic.

1. DALL·E 2
It is an AI system that can take a simple description in natural language and turn it into a realistic image or work of art.

2. StyleGAN 3
An AI system can generate photorealistic images of anything the user can imagine, from human faces to animals and cars. Furthermore, it provides a remarkable degree of personalization by allowing users to manipulate the generated images’ style, shape, and pose.

2. Diffusion model
Generative diffusion models can create new data using the data they were trained on. For instance, when trained on an assortment of human faces, a diffusion model can create new and lifelike faces with diverse features and expressions, even if they were not present in the original dataset.
The fundamental idea behind diffusion models is to transform a simple and easily obtainable distribution into a more complex and meaningful data distribution. This transformation is accomplished through a series of reversible operations. Once the model understands the process of transformation, it can generate new samples by starting from a point within the simple distribution and gradually spreading it towards the desired complex data distribution.

-Stable Diffusion
IT is a generative AI model that creates photorealistic images, videos, and animations from text and image prompts. It uses diffusion technology and latent space, which reduces processing requirements and allows it to run on desktops or laptops with GPUs. With transfer learning, developers can fine-tune the model with just five images to meet their needs. It was launched in 2022.

-DALL-E 2
An innovative language model developed by OpenAI showcases its extraordinary talent for converting textual descriptions into breathtaking images using an advanced diffusion model. The model uses contrastive learning to recognize the differences between similar images and create new ones. It has practical applications in design, advertising, and content creation, making it a groundbreaking example of human-centered AI.

3. Variational Autoencoders (VAEs)
VAEs are generative models which combine the capabilities of autoencoders and probabilistic modeling to acquire a compressed representation of data. VAEs encode input data into a lower-dimensional latent space, allowing the generation of new samples by sampling points from the acquired distribution. With practical applications spanning image generation, data compression, anomaly detection, and drug discovery, VAEs exhibit versatility across various domains.

4. Flow model
Flow-based models are generative ai model that aims to learn the underlying structure of a given dataset. These models achieve this by understanding the probability distribution of the different values or events within the dataset.Once the model has acquired this probability distribution, it is capable of generating fresh data points that maintain identical statistical properties and characteristics to those of the initial dataset.
A key feature of flow-based models is that they apply a simple invertible transformation to the input data that can be easily reversed. By starting from a simple initial distribution, such as random noise, and applying the transformation in reverse, the model can quickly generate new samples without requiring complex optimization. This makes flow-based models computationally efficient and faster than other models.

5.General GAI (Generative AI) Model
The development of big data and data representation technologies has enabled the generation of human-readable language from input data patterns and structures, allowing us to achieve objectives across various environments. This goal seeks to go beyond the language generation paradigm, restricted to adapting sample distributions for tasks.

1. The Generative Pre-Trained Transformer (GPT)
Initially showcased its potential for generating task-specific natural language through unsupervised pre-training and fine-tuning for downstream tasks. It utilizes transformer-decoder layers for next-word prediction and coherent text generation. Fine-tuning is used to adapt it to a specific task based on pre-training.

2. GPT-2
It expands on its predecessor’s model structure and parameters and trains on various datasets beyond just web text. Despite exhibiting advanced results with zero-shot learning, it still falls under task-specific GAI.

3. GPT-3
It is a language model that employs Prompt to reduce the dependence on large, supervised datasets. It uses the linguistic structure of text probability to make predictions. The model is pre-trained on a vast amount of text, allowing it to perform few-shot or zero-shot learning. By defining a new cue template, it swiftly adjusts to new scenarios, even in situations where there is limited or no labelled data. This methodology proves advantageous for tasks that involve language comprehension and generation, as it minimizes the amount of data needed and enhances overall performance.
Recently, GPT-4, the latest model developed by OpenAI, was trained with an unprecedented scale of computations and data, surprisingly achieved human-like performance across almost all tasks and significantly outperformed its predecessors. The introduction of GPT-4 represents a significant leap forward in the field of General Artificial Intelligence (GAI). Building upon the success of the previous GPT models, GPT-4 showcases remarkable advancements in its ability to perceive and generate multimodal data, including text, images, and audio. This groundbreaking development holds great promise for the field of materials science research.. The formidable capabilities of GPT-4 in multimodal generation and conversational interactivity offer a promising outlook for materials science research.

4. LLaMA from Meta
Meta, formerly known as Facebook, has recently announced a new LLM in 2023. The LLM is called LLaMA, which stands for Large Language Model for Meta Applications and comes with 600 billion parameters. LLaMA has been trained on various data sources, including social media posts, web pages, books, news articles, and more. Its purpose is to support various Meta applications such as content moderation, search, recommendation, and personalization. LLaMA claims to be more ethical by incorporating human feedback, fairness, and transparency in its training.

5. PaLM 2 from Google
The PaLM model has a new version called PaLM 2, which will be released in 2023 with 400 billion parameters. It is a multimodal LLM that can process and generate text and images. It has been trained with a large-scale dataset that covers 100 languages and 40 visual domains, making it capable of performing cross-modal tasks such as image captioning, visual question answering, text-to-image synthesis, and more. Palm 2 generalizes to new tasks and domains without fine-tuning, thanks to its zero-shot learning capability.

6. BLOOM
BLOOM generates text in 46 natural languages, dialects, and 13 programming languages. It has been trained on enormous data, totaling 1.6 terabytes, equivalent to 320 copies of Shakespeare’s works.The model has the capability to process a total of 46 languages, which encompass French, Vietnamese, Mandarin, Indonesian, Catalan, 13 Indic languages (including Hindi), as well as 20 African languages. Although just over 30% of the training data was in English, the system is proficient in all mentioned languages.

7. BERT from Google:
One of Google’s most influential LLMs released in 2018 is BERT. BERT is an abbreviation for Bidirectional Encoder Representations from Transformers, which contains 340 million parameters. BERT, which is constructed based on the transformer framework, leverages bidirectional self-attention to acquire knowledge from extensive volumes of textual data. With its capabilities, BERT is proficient in executing diverse natural language tasks such as text classification, sentiment analysis, and named entity recognition. Additionally, BERT is widely used as a pre-trained model for fine-tuning specific downstream tasks and domains.