Expert Reference Vol. I — Foundations

Introduction to
Generative AI
& Large Language Models

A comprehensive guide to understanding how modern AI systems generate text, images, code, and more — from mathematical foundations to real-world deployment.

175B+ Parameters in GPT-3

~1T Tokens in training

2017 Transformer paper

What is Generative AI?

Generative AI refers to systems that can produce new content — text, images, audio, video, code — by learning statistical patterns from vast amounts of existing data.

Unlike traditional discriminative models that classify or predict labels for given inputs, generative models learn the underlying distribution of data and can sample novel outputs from that distribution. This fundamental difference unlocks creative and open-ended capabilities previously impossible with classical machine learning.

✦

Text Generation

Writing, summarisation, translation, Q&A, code completion, dialogue.

◈

Image Synthesis

Diffusion models and GANs create photorealistic and artistic imagery.

◉

Audio & Music

Speech synthesis, music generation, voice cloning, sound design.

⬡

Multimodal

Unified models that reason across text, vision, audio simultaneously.

Key Insight

Generative AI doesn’t “understand” in the human sense — it models the statistical likelihood of tokens given context, with emergent behaviours that can appear remarkably human-like.

A Brief History

From perceptrons to foundation models — the trajectory of generative AI spans seven decades of research breakthroughs.

1950s

Early Neural Networks

Rosenblatt’s Perceptron (1958) introduced the first trainable neural model. Limited by single-layer architecture.

1986

Backpropagation

Rumelhart, Hinton & Williams demonstrate efficient gradient-based learning in multi-layer networks.

1997

LSTM Networks

Hochreiter & Schmidhuber introduce Long Short-Term Memory, enabling sequence modelling over longer contexts.

2014

GANs & Seq2Seq

Goodfellow introduces Generative Adversarial Networks. Seq2Seq models emerge for neural machine translation.

2017

“Attention Is All You Need”

Vaswani et al. at Google introduce the Transformer architecture — the foundation of all modern LLMs.

2018–20

BERT, GPT-2 & GPT-3

OpenAI and Google scale up pre-trained language models, demonstrating few-shot and zero-shot capabilities.

2022+

ChatGPT & the Public Moment

Instruction-tuned models and RLHF make LLMs accessible. Stable Diffusion democratises image synthesis. The era of foundation models begins.

How LLMs Work

Large Language Models are autoregressive neural networks that predict the probability distribution of the next token given all preceding tokens.

Token Prediction Pipeline

Raw Text

→

Tokeniser

→

Embedding

→

Transformer Blocks

→

Logits

→

Softmax

→

Next Token

Tokens are subword units — typically 3–4 characters on average. A tokeniser like BPE (Byte Pair Encoding) converts raw text into integer IDs. Each ID maps to a learned vector (embedding) in high-dimensional space.

These embeddings pass through a stack of Transformer blocks, each performing multi-head self-attention and feed-forward operations. The final output is a probability distribution over the vocabulary. Sampling from this distribution yields the next token.

# Simplified autoregressive generation
def generate(model, prompt, max_tokens=200):
    tokens = tokenise(prompt)
    for _ in range(max_tokens):
        logits = model(tokens)          # Forward pass
        probs  = softmax(logits[-1])    # Last token distribution
        next_t = sample(probs, temp=0.8) # Temperature sampling
        tokens.append(next_t)
        if next_t == EOS_TOKEN: break
    return detokenise(tokens)
      

The Transformer Architecture

The Transformer, introduced in “Attention Is All You Need” (2017), replaced recurrent networks with a fully attention-based architecture that parallelises training over entire sequences.

“Attention mechanisms allow the model to weigh the relevance of every token in context when encoding each position — enabling rich long-range dependencies.” — Vaswani et al., 2017

Self-Attention

Queries, Keys & Values
Scaled dot-product scoring
Multi-head parallelism
O(n²) time complexity
Absolute + rotary positional encodings

Feed-Forward Layers

Two linear projections
Non-linearity (GeLU/SiLU)
4× hidden dimension expansion
Applied position-wise
Stores factual knowledge

Each block also includes residual connections and Layer Normalisation for training stability. Modern LLMs stack 32–96 such blocks. Context windows have expanded from 2,048 tokens (GPT-2) to over 1 million tokens in recent models.

The Training Process

Modern LLM training occurs in stages, each refining the model’s behaviour from raw pattern matching to helpful, aligned interaction.

①

Pre-Training

Self-supervised next-token prediction on trillions of tokens from web, books, code. Learns world knowledge and language.

②

Supervised Fine-Tuning

Human-written demonstrations of desired behaviours. Teaches instruction following, Q&A, formatting norms.

③

RLHF

Reinforcement Learning from Human Feedback. A reward model ranks outputs; PPO optimises toward human preferences.

④

Constitutional AI

AI-generated feedback based on explicit principles. Scales alignment supervision without exhaustive human labelling.

Scale Laws

Chinchilla research (Hoffmann et al., 2022) showed model performance scales predictably with both parameter count and training token count — optimal training requires roughly 20× more tokens than parameters.

The Art of Prompting

Prompt engineering is the practice of crafting inputs that elicit optimal model outputs — a skill that blends linguistic intuition with mechanistic understanding.

Technique	Description	Best For
Zero-Shot	No examples, just instructions	Simple tasks
Few-Shot	2–8 input/output examples in context	Format learning
Chain-of-Thought	“Think step by step” elicits reasoning	Complex reasoning
System Prompt	Persona and context framing	Tone & Role
RAG	Retrieved context injected into prompt	Knowledge grounding
Tool Use	Model calls external functions/APIs	Agentic workflows

Notable Models

The landscape of foundation models has diversified rapidly, with closed and open-weight options spanning a wide range of capability and scale.

Model	Organisation	Release	Notable Feature
GPT-4o	OpenAI	2024	Omni — text, vision, audio
Claude 3.7 Sonnet	Anthropic	2025	Extended thinking, 200K context
Gemini 2.0 Ultra	Google DeepMind	2025	Natively multimodal
Llama 3.1 405B	Meta AI	2024	Open weights, 128K context
Mistral Large	Mistral AI	2024	European-built, multilingual
DeepSeek-V3	DeepSeek	2025	MoE architecture, cost-efficient

Real-World Applications

Generative AI is reshaping industries at an extraordinary pace — from automating knowledge work to enabling entirely new categories of product.

💊

Drug Discovery

Protein structure prediction, molecule generation, clinical trial optimisation.

⚖️

Legal

Contract analysis, case research, document drafting and review automation.

💻

Software Dev

Code generation, debugging, documentation, test writing, refactoring.

🎨

Creative Media

Copywriting, image/video production, game asset generation, storyboarding.

📚

Education

Personalised tutoring, curriculum generation, instant feedback systems.

📊

Finance

Earnings analysis, fraud detection, report generation, risk narratives.

Challenges & Ethics

With remarkable capability comes significant responsibility. The field grapples with fundamental technical and societal challenges.

Hallucination

LLMs confidently produce plausible-sounding but factually incorrect content. Mitigations include RAG, fine-tuning, and chain-of-thought verification.

Bias & Fairness

Training data reflects societal biases. Models can perpetuate or amplify stereotypes across gender, race, culture, and ideology.

Safety & Alignment

Ensuring models behave in accordance with human values at scale remains an open research problem. Misuse, misalignment, and catastrophic risk all require active mitigation.

Environmental Cost

Training frontier models requires significant energy — GPT-3’s training emitted ~552 tonnes CO₂e. Inference at scale compounds this further.

Intellectual Property

Legal uncertainty around training on copyrighted data, model outputs, and attribution is actively contested across multiple jurisdictions.

Key Glossary

Token

A subword unit — the atomic element processed by LLMs. “Unbelievable” might tokenise as [“Un”,”believ”,”able”].

Embedding

A dense vector representation of a token in continuous space. Semantically similar tokens cluster together.

Attention

A mechanism that weights the relevance of all context tokens when computing a representation for each position.

Temperature

A sampling parameter (0–2) controlling output randomness. Lower = deterministic; higher = creative/chaotic.

Context Window

The maximum number of tokens an LLM can “see” in one forward pass. Determines memory capacity.

Fine-Tuning

Continued training of a pre-trained model on a smaller, task-specific dataset to specialise its behaviour.

RLHF

Reinforcement Learning from Human Feedback. Aligns model outputs to human preferences via a reward model.

RAG

Retrieval-Augmented Generation. Grounds LLM responses in retrieved documents to reduce hallucination.

Inference

Running a trained model on new inputs to produce outputs — as opposed to training.

Perplexity

A measure of how well a language model predicts a text sample. Lower perplexity indicates better fit.

Generative AI & Large Language Models (LLMs) Guide: Foundations, Architecture & Future

ChatGPT and other Generative AI Tools – An insight and applicatio…

ChatGPT and Generative AI: The Essential Guide for Everyone

Generative AI for everyone: Understanding the essentials and appl…

Generative Ai in Action

AI Made For You: Your Essential Guide to Generative AI, LLMs, Pro…

Introduction to
Generative AI
& Large Language Models

What is Generative AI?

A Brief History

How LLMs Work

The Transformer Architecture

Self-Attention

Feed-Forward Layers

The Training Process

The Art of Prompting

Notable Models

Real-World Applications

Challenges & Ethics

Key Glossary

ChatGPT and other Generative AI Tools – An insight and applicatio…

Generative AI for everyone: Understanding the essentials and appl…

Generative Ai in Action

Generative AI for Everyone: Deep learning, NLP, and LLMs for crea…

Unlocking Data with Generative AI and RAG: Enhance generative AI …

By Somish Saipar

Leave a Reply Cancel reply

Oops, looks like this got skipped!

LLMs vs Diffusion Models (2026) — Which AI Architecture Wins? | In-Depth Comparison

AI Nexus 2026 | OpenAI vs Gemini vs Open-Source Models Comparison

AI Export — Best Multimodal AI for Text, Image, Audio & Video Generation

How Large Language Models Work: Complete Guide to LLM Architecture & Training

ChatGPT and other Generative AI Tools – An insight and applicatio…

ChatGPT and Generative AI: The Essential Guide for Everyone

Generative AI for everyone: Understanding the essentials and appl…

Generative Ai in Action

AI Made For You: Your Essential Guide to Generative AI, LLMs, Pro…

Introduction toGenerative AI& Large Language Models

What is Generative AI?

A Brief History

How LLMs Work

The Transformer Architecture

Self-Attention

Feed-Forward Layers

The Training Process

The Art of Prompting

Notable Models

Real-World Applications

Challenges & Ethics

Key Glossary

ChatGPT and other Generative AI Tools – An insight and applicatio…

Generative AI for everyone: Understanding the essentials and appl…

Generative Ai in Action

Generative AI for Everyone: Deep learning, NLP, and LLMs for crea…

Unlocking Data with Generative AI and RAG: Enhance generative AI …

By Somish Saipar

Related Post

Leave a Reply Cancel reply

Oops, looks like this got skipped!

Introduction to
Generative AI
& Large Language Models