Bestseller #1

Artificial Intelligence for Dummies

₹688

Buy on Amazon

Bestseller #2

Meta AI Unleashed: The cutting-edge artificial intelligence from …

Buy on Amazon

Bestseller #4

META AI (Italian Edition)

Buy on Amazon

Pretrained Models

Foundation Models · Deep Learning

The Pretrained Models
Shaping Modern AI

A visual guide to GPT, BERT, LLaMA, and Claude — the transformer-based architectures that redefined what language models can do.

🧠

GPT Series

OpenAI · 2018 – present

Autoregressive Decoder-only Generative Causal LM

The Generative Pre-trained Transformer family pioneered large-scale unsupervised pretraining on internet text followed by task-specific fine-tuning. GPT-3 (175 B params) demonstrated that scale alone unlocks emergent few-shot abilities, while GPT-4 introduced multimodal reasoning and RLHF alignment.

175BGPT-3 params

2018First release

96LGPT-3 layers

🔍

BERT

Google · 2018

Bidirectional Encoder-only MLM NSP

Bidirectional Encoder Representations from Transformers changed NLP benchmarks overnight. By masking random tokens and training the model to predict them using left and right context simultaneously, BERT produced deeply contextual embeddings ideal for classification, NER, QA, and semantic search.

340MLarge params

2018Released

24LLarge layers

🦙

LLaMA Series

Meta AI · 2023 – present

Open-weights Decoder-only RoPE GQA

Large Language Model Meta AI democratised foundation-model research by releasing competitive weights publicly. LLaMA 2 added grouped-query attention for efficiency; LLaMA 3 extended context to 128 K tokens and trained on 15 T tokens. Its open availability spurred thousands of fine-tunes and derivative models.

405BLlama 3 max

128KContext window

15TTraining tokens

✦

Claude Series

Anthropic · 2023 – present

Constitutional AI RLHF Safety-first Long context

Built around Constitutional AI — a method that uses a set of principles to guide self-critique and revision — Claude prioritises helpfulness, harmlessness, and honesty. Claude 3 Opus matched or exceeded GPT-4 on many benchmarks; the Claude 3.5 and 4 families extended multimodal reasoning and tool use.

200KContext tokens

2023First release

CAIAlignment method

Architecture at a Glance

Model	Architecture	Training objective	Best for
GPT	Decoder-only transformer	Next-token prediction (CLM)	Open-ended generation, chat, code
BERT	Encoder-only transformer	Masked LM + Next sentence pred.	Classification, NER, semantic search
LLaMA	Decoder-only (RoPE + GQA)	Next-token prediction (CLM)	Open research, fine-tuning, edge deploy
Claude	Decoder-only + Constitutional AI	RLHF + CAI self-critique	Long-context reasoning, safe assistants

A Brief History

2017

Attention Is All You Need — Vaswani et al. introduce the Transformer, replacing recurrent nets with pure self-attention, laying the foundation for every model on this page.
2018

GPT-1 & BERT — OpenAI’s GPT shows unsupervised pretraining + fine-tuning wins at NLU. Google’s BERT simultaneously proves bidirectional context is king for understanding tasks.
2020

GPT-3 — 175 B parameters and in-context few-shot learning stun the research community. Scale, it turns out, is a feature.
2023

LLaMA 1 & Claude 1 — Meta opens the weights to researchers; Anthropic ships Constitutional AI-aligned Claude. The open/closed dichotomy defines a new era of LLM competition.
2024 – 25

Claude 3 / 4, LLaMA 3, GPT-4o — Multimodal reasoning, 128 K–1 M token contexts, tool use, and real-time voice. The frontier accelerates faster than ever.

Bestseller #1

Generative AI for Everyone: Deep learning, NLP, and LLMs for crea…

₹902

Buy on Amazon

Bestseller #2

Generative Ai for Web Engineering Models

₹23,559

Buy on Amazon

Bestseller #3

Advanced Interdisciplinary Applications of Large Language Models …

₹16,035

Buy on Amazon

Bestseller #4

Building Generative AI Models with PyTorch: Advancing Innovation …

₹2,500

Buy on Amazon

Bestseller #5

The AI Dictionary Including Large Language Model Terms : A Book o…

Buy on Amazon

Pretrained Language Models Explained: GPT, BERT, LLaMA & Claude — The Transformers Shaping Modern AI

Artificial Intelligence for Dummies

META AI

Meta AI Unleashed: The cutting-edge artificial intelligence from …

META AI (Italian Edition)

The Pretrained Models
Shaping Modern AI

Architecture at a Glance

A Brief History

Generative AI for Everyone: Deep learning, NLP, and LLMs for crea…

Generative Ai for Web Engineering Models

Advanced Interdisciplinary Applications of Large Language Models …

Building Generative AI Models with PyTorch: Advancing Innovation …

The AI Dictionary Including Large Language Model Terms : A Book o…

By Somish Saipar

Leave a Reply Cancel reply

You Missed

LLM Fine-Tuning & Optimization: Instruction Tuning, LoRA, RLHF & Prompt Strategies

PEFT, LoRA & QLoRA Explained: The Complete Guide to Efficient LLM Fine-Tuning (2025)

Mastering AI Expertise Through Fine-Tuning

Claude AI API Integration — Build Smarter Apps with the World’s Most Capable AI (2026)

About Us

Follow Us

Latest Posts

LLM Fine-Tuning & Optimization: Instruction Tuning, LoRA, RLHF & Prompt Strategies

PEFT, LoRA & QLoRA Explained: The Complete Guide to Efficient LLM Fine-Tuning (2025)

Mastering AI Expertise Through Fine-Tuning

Claude AI API Integration — Build Smarter Apps with the World’s Most Capable AI (2026)

Feed the algorithm. Can we parallel paths are we in agreeance?

Artificial Intelligence for Dummies

META AI

Meta AI Unleashed: The cutting-edge artificial intelligence from …

META AI (Italian Edition)

Architecture at a Glance

A Brief History

Generative AI for Everyone: Deep learning, NLP, and LLMs for crea…

Generative Ai for Web Engineering Models

Advanced Interdisciplinary Applications of Large Language Models …

Building Generative AI Models with PyTorch: Advancing Innovation …

The AI Dictionary Including Large Language Model Terms : A Book o…

By Somish Saipar

Related Post

Leave a Reply Cancel reply

You Missed