\( \def\cuberoot#1{\sqrt[3]{#1}} \def\fourthroot#1{\sqrt[4]{#1}} \def\abspartial#1#2#3#4{\left|\,{\partial(#1,#2)\over\partial(#3,#4)}\,\right|} \def\absdeltal#1#2#3#4{\left|\,{\d(#1,#2)\over\d(#3,#4)}\,\right|} \def\dispop#1#2{\disfrac{\partial #1}{\partial #2}} \def\definedas{\equiv} \def\bb{{\bf b}} \def\bB{{\bf B}} \def\bsigma{\boldsymbol{\sigma}} \def\bx{{\bf x}} \def\bu{{\bf u}} \def\Re{{\rm Re\hskip1pt}} \def\Reals{{\mathbb R\hskip1pt}} \def\Integers{{\mathbb Z\hskip1pt}} \def\Naturals{{\mathbb N\hskip1pt}} \def\Im{{\rm Im\hskip1pt}} \def\P{\mbox{P}} \def\half{{\textstyle{1\over 2}}} \def\third{{\textstyle{1\over3}}} \def\fourth{{\textstyle{1\over 4}}} \def\fifth{{\scriptstyle{1\over 5}}} \def\sixth{{\textstyle{1\over 6}}} \def\oA{\rlap{$A$}\kern2pt\overline{\phantom{\dis{}I}}\kern.5pt} \def\obA{\rlap{$A$}\kern2pt\overline{\phantom{\dis{}I}}\kern.5pt} \def\obX{\rlap{$X$}\kern2pt\overline{\phantom{\dis{}I}}\kern.5pt} \def\obY{\rlap{$Y$}\kern2pt\overline{\phantom{\dis{}I}}\kern.5pt} \def\obZ{\rlap{$Z$}\kern2pt\overline{\phantom{\dis{}I}}\kern.5pt} \def\obc{\rlap{$c$}\kern2pt\overline{\phantom{\dis{}I}}\kern.5pt} \def\obd{\rlap{$d$}\kern2pt\overline{\phantom{\dis{}I}}\kern.5pt} \def\obk{\rlap{$k$}\kern2pt\overline{\phantom{\dis{}I}}\kern.5pt} \def\oba{\rlap{$a$}\kern2pt\overline{\phantom{\dis{}I}}\kern.5pt} \def\obb{\rlap{$b$}\kern1pt\overline{\phantom{\dis{}t}}\kern.5pt} \def\obw{\rlap{$w$}\kern1pt\overline{\phantom{\dis{}t}}\kern.5pt} \def\obz{\overline{z}}\kern.5pt} \newcommand{\bx}{\boldsymbol{x}} \newcommand{\by}{\boldsymbol{y}} \newcommand{\br}{\boldsymbol{r}} \renewcommand{\bk}{\boldsymbol{k}} \def\cuberoot#1{\sqrt[3]{#1}} \def\fourthroot#1{\sqrt[4]{#1}} \def\fifthroot#1{\sqrt[5]{#1}} \def\eighthroot#1{\sqrt[8]{#1}} \def\twelfthroot#1{\sqrt[12]{#1}} \def\dis{\displaystyle} %\def\definedas{\equiv} \def\bq{{\bf q}} \def\bp{{\bf p}} \def\abs#1{\left|\,#1\,\right|} \def\disfrac#1#2{{\displaystyle #1\over\displaystyle #2}} \def\select#1{ \langle\, #1 \,\rangle } \def\autoselect#1{ \left\langle\, #1 \,\right\rangle } \def\bigselect#1{ \big\langle\, #1 \,\big\rangle } \renewcommand{\ba}{\boldsymbol{a}} \renewcommand{\bb}{\boldsymbol{b}} \newcommand{\bc}{\boldsymbol{c}} \newcommand{\bh}{\boldsymbol{h}} \newcommand{\bA}{\boldsymbol{A}} \newcommand{\bB}{\boldsymbol{B}} \newcommand{\bC}{\boldsymbol{C}} \newcommand{\definedas}{\equiv} \newcommand{\half}{\frac{1}{2}} %\newcommand{\slfrac}[2]{\raisebox{0.5pt}{$\scriptstyle{}^{#1}\!/\!_{#2}$}} \def\slfrac#1#2{\raise.8ex\hbox{$\scriptstyle#1$}\!/\!\lower.5ex\hbox{$\scriptstyle#2$}} \newcommand{\bsigma}{\boldsymbol \sigma} \newcommand{\abs}[1]{\left|\,#1\,\right|} \newcommand{\Rectangle}{\sqsubset\!\sqsupset} \newcommand{\rectangle}{{% \ooalign{$\sqsubset\mkern3mu$\cr$\mkern3mu\sqsupset$\cr}% % How to do fractions: for 1/2 use this in place: $\raise{0.8pt}{\scriptstyle{}^{1}\!/\!_{2}}$ % for n/m use this in place: $\raise{0.8pt}{\scriptstyle{}^{n}\!/\!_{m}}$ }} \)

GAP Comments on Entropy, Part_7: What Is Renormalization?

Patrick Reany

17 January 2026

Note: This chat I had with Copilot is likely edited for relevance and brevity. I often do not take the time to highlight all the words that Copilot highlights. Further, in the interest of being didactic, I might arbitrarily highlight an occasional word or phrase that Copilot did not. (Naturally, I get to correct my spelling, grammar, and other minor errors for free.)

I report as faithfully as I can what LLMs tell me, but I can't guarantee that they are correct on every factual matter. If the subject should touch on physical or mental self-care, regard the content merely for information sake, rather than medical advice.



Patrick:
I just saw my first introduction to renormalization. But I am confused --- not by the math, but by the name. Before one can renormalize, shouldn't one first have to normalize? My hypothesis as to why the process is called "renormalization" is because the process can be repeated an indefinite number of times, I suppose to average over ever-larger domains during course-graining. Do you have an explanation?

Copilot:
The name renormalization is one of those historical accidents that stuck, even though it obscures more than it reveals. Your instinct—that the “re-” suggests iteration—is actually closer to the modern meaning than the original one, but the story is richer and more tangled.

Let me give you the clean conceptual picture first, then the historical one, and then tie it back to your intuition about coarse‑graining.


1. The conceptual meaning (modern viewpoint)

In the modern Wilsonian picture, renormalization is not about “normalizing” something in the sense of making it integrate to 1. It’s about redefining the parameters of a theory after you change the resolution at which you describe the system.

When you coarse‑grain—integrate out short‑distance degrees of freedom—you get a new effective theory. But the new theory generally has:

To compare the new theory to the old one, you re‑express the fields and couplings in a standard form. That act of redefining the parameters so the theory keeps the same canonical shape is what is called renormalization.

So the “re-” is not “normalize again,” but “re‑express in a standard form.”

It’s like zooming out on a map and then redrawing the legend so that “1 inch = 10 miles” becomes “1 inch = 50 miles.” You’re not normalizing the map; you’re renormalizing the units.


2. The historical meaning (the original 1940s–1950s usage)

Originally, the term came from quantum electrodynamics (QED), where infinities appeared in perturbation theory. Physicists introduced:

and then defined the physical mass and charge by subtracting the infinities. The process was called normalization of the theory: fixing the parameters so that the theory matched observed quantities.

But because the infinities appeared at every order of perturbation theory, one had to normalize again and again at each order. Dyson and others started calling this repeated process re-normalization.

So the “re-” originally meant:

“We have to normalize the theory repeatedly as we go to higher orders.”
This is the origin of the name.


3. The modern RG interpretation (your intuition is excellent) Your hypothesis—that the “re-” reflects the fact that the operation can be iterated indefinitely—is exactly how the term is understood in the Wilsonian renormalization group (RG) framework.

In Wilson’s picture:

  1. You coarse‑grain by a factor \(b\).
  2. You rescale lengths and fields to restore the canonical form of the theory.
  3. You obtain new couplings \(g' = R(g)\).
  4. You repeat.

This iteration is the renormalization group flow.

So in the modern sense:

Your intuition about “averaging over ever-larger domains” is precisely the Wilsonian picture.


4. Why the name stuck

Even though the modern meaning is different from the original QED meaning, the term renormalization survived because:

So the name is historically accidental but conceptually apt.


5. A crisp summary

  • Your intuition: The “re-” reflects the fact that the operation is iterated as you zoom out. This is exactly how physicists think about it today.


    If you want, we can go deeper into how the RG map is constructed, why fixed points correspond to conformal field theories, or how renormalization becomes a kind of “learning dynamics” in theory space.

    Patrick:
    Yes, let's go deeper into how the RG map is constructed and why fixed points correspond to conformal field theories. But first, who is this Wilson you referred to?

    Copilot:
    Kenneth G. Wilson is one of the rare figures in physics whose work didn’t just solve a problem — it changed the grammar of the entire field. If you’re diving into renormalization and RG flows, you’re walking straight into the territory he reshaped.

    Let me give you a crisp, layered picture of who he was and why his ideas matter so deeply for the RG map and CFTs.


    Kenneth G. Wilson — the architect of modern renormalization

    Kenneth Wilson (1936–2013) was an American theoretical physicist who won the 1982 Nobel Prize in Physics for creating the modern renormalization group. Before Wilson, renormalization was a bag of tricks; after Wilson, it became a unified, conceptual, dynamical framework.

    His key contributions:

    1. He reframed renormalization as coarse‑graining

    Instead of treating infinities as algebraic nuisances, Wilson said:

    “Physics at different length scales should be related by a flow.”
    This was revolutionary. It turned renormalization into a map on theory space, not a perturbative hack.

    2. He introduced the idea of integrating out short‑distance modes

    This is the heart of the RG map:

    This is the “zoom‑out and rewrite” operation you and I have been discussing.

    3. He explained universality

    Why do wildly different microscopic systems share the same critical exponents? Because they flow to the same RG fixed point.

    This insight unified:

    4. He made CFTs central

    Wilson didn’t use the modern CFT language, but his fixed points are what we now call conformal field theories. He set the stage for the explosion of CFT in the 1970s–1990s.


    Now, with Wilson in place, we can go deeper into the RG map

    Let’s build the RG map in the cleanest possible way — the way Wilson intended.

    1. Start with a theory with a UV cutoff Say a scalar field theory in \(d\) dimensions: \[ S[\phi] = \int d^d x \left[ \frac{1}{2} (\nabla \phi)^2 + \frac{1}{2} m^2 \phi^2 + \lambda \phi^4 + \cdots \right] \] Define a momentum cutoff \(\Lambda\).

    2. Split the field into slow and fast modes \[ \phi(x) = \phi_< (x) + \phi_>(x) \] where:

    3. Integrate out the fast modes \[ e^{-S'[\phi_<]} = \int \mathcal{D}\phi_> \, e^{-S[\phi_< + \phi_>]} \] This produces a new effective action \(S'\).

    This step is the “coarse‑graining” or “averaging over ever‑larger domains” you intuited.

    4. Rescale momenta and fields

    To compare \(S'\) to the original \(S\), rescale:

    where \(\Delta_\phi\) is the engineering dimension of the field.

    This restores the cutoff to \(\Lambda\).

    5. Read off the new couplings

    After rescaling, the action again looks like: \[ S'[\phi'] = \int d^d x' \left[ \frac{1}{2} (\nabla \phi')^2 + \frac{1}{2} m'^2 \phi'^2 + \lambda' \phi'^4 + \cdots \right] \] The map: \[ (m^2, \lambda, \ldots) \mapsto (m'^2, \lambda', \ldots) \] is the RG transformation.

    Iterating it gives the RG flow.


    Why fixed points correspond to conformal field theories

    A fixed point satisfies: \[ g_i' = g_i \] for all couplings \(g_i\).

    At such a point:

    This implies scale invariance.

    Now the key theorem (heuristic in general, rigorous in 2D):

    Scale invariance + unitarity + Poincaré invariance ⇒ conformal invariance.
    Thus RG fixed points are conformal field theories.

    This is why:

    CFTs are the attractors of RG flow.

    They are the “irreducible macroscopic descriptions” of systems at criticality — the endpoints of the zoom‑out process.


    If you want, we can now go even deeper

    We can explore any of these directions:

    Just tell me where you want to steer the ship next.

    Patrick:
    They all look good, but to finish off this chat, let's go first with the deep connection between RG and entropy/monotonicity.

    Copilot:
    You’re gravitating to the part of RG that feels like a law of direction in theory space: not just “things change with scale,” but “they change in a way that can’t be undone without paying a price.” That price is information, and the bookkeeping device is an entropy-like monotone.


    RG as irreversible information loss

    Coarse-graining as a channel

    In Wilsonian RG you integrate out short-distance modes (or average over blocks). That operation is intrinsically many-to-one: many microstates map to the same coarse description. In modern language it behaves like a stochastic map / quantum channel from UV degrees of freedom to IR degrees of freedom.

    That’s the root of irreversibility:

    This is why, despite the name “renormalization group,” RG flow is typically a semigroup (you can compose coarse-grainings forward, but not invert them generically).


    What monotonicity means in RG

    A monotone is a function \(M(\text{theory})\) that satisfies \[ \frac{dM}{d\log \mu} \le 0 \quad \text{(or } \ge 0\text{)} \] along RG flow (depending on conventions). Intuitively: theory space has a preferred direction.

    At a fixed point (a CFT), \(M\) becomes constant and typically equals a universal “number of degrees of freedom” at that scale.


    The canonical examples: c, F, a

    2D: the \(c\)-theorem

    In 2D unitary QFT, Zamolodchikov constructed a function \(c(\mu)\) such that:

    Interpretation: the IR theory has “fewer effective degrees of freedom” than the UV. This is the cleanest case: monotonicity is tight, universal, and geometrically natural.

    3D: the \(F\)-theorem

    In 3D, a comparable monotone is

    and RG flow decreases \(F\) for unitary Lorentz-invariant theories. At fixed points \(F\) is a CFT invariant. This is often read as “the finite part of entanglement across a circle decreases.”

    4D: the \(a\)-theorem

    In 4D, Komargodski–Schwimmer established that the Weyl anomaly coefficient \(a\) decreases along RG flow:

    So across dimensions you see the same shape: a universal quantity decreases and becomes a CFT label at fixed points.


    Entanglement is the bridge: why “degrees of freedom” becomes precise

    A surprisingly sharp way to understand these monotones is via entanglement:

    Under the hood, the key inequality is usually some form of:

    In other words: RG monotonicity is an avatar of a deep information-theoretic monotonicity principle.


    Relative entropy: the clean “entropy/monotonicity engine”

    If you want a single conceptual mechanism that feels closest to what you call monotonicity invariants, it’s this:

    Data processing inequality

    For two states \(\rho,\sigma\) and a channel \(\mathcal{E}\), \[ S(\rho\|\sigma) \ge S(\mathcal{E}(\rho)\|\mathcal{E}(\sigma)) \] Relative entropy can only go down when you throw information away.

    Now map this onto RG:

    This is the sense in which RG flow behaves like an information-loss gradient. The fixed points are the “stable summaries” that survive arbitrary coarse-graining.


    Why fixed points feel like maximum symmetry and minimum memory

    At an RG fixed point:

    From an entropy/monotonicity perspective:

    So the fixed point is not “maximum randomness”; it’s “maximum symmetry subject to locality/unitarity,” and therefore a kind of minimal sufficient description for a universality class.


    The intuition I think you’re reaching for

    RG monotones are doing something like:

    “Count how much structured, recoverable distinction exists in the UV description that cannot survive coarse-graining.”
    That’s why the numbers (\(c\), \(F\), \(a\)) behave like:


    If you want one final, sharp framing question

    When you hear “entropy + monotonicity,” do you want the version that’s:

    Pick one, and I’ll give you the cleanest “closing arc” explanation in that language.

    Patrick:
    I should prefer for now the thermodynamic/stat mech version. Also, I've been meaning to ask you what you mean by statements like "stochastic map / quantum channel from UV degrees of freedom to IR degrees of freedom." I presume that UV stands for Ultraviolet and IR stands for Infrared, but that's as far as I understand.

    Copilot:
    Good — this is exactly the right place to slow down and re‑ground RG in the language of statistical mechanics, where its irreversibility becomes almost embarrassingly concrete.

    I’ll do two things, in order:

    1. Explain UV vs IR in plain stat‑mech terms
    2. Explain why coarse‑graining is literally a stochastic map, and why that implies entropy monotonicity

    No field‑theory mysticism required.


    1. What “UV” and “IR” really mean (stat‑mech version)

    Yes, ultraviolet and infrared come from wavelength language, but in statistical mechanics they translate very cleanly.

    UV = microscopic degrees of freedom

    Think:

    IR = macroscopic / collective degrees of freedom

    Think:

    So UV → IR means:

    “Replace a detailed microscopic description with a coarser macroscopic one.”
    Nothing exotic yet.


    2. Coarse‑graining as a stochastic map

    Now the key step.

    What coarse‑graining actually does

    Take a microscopic configuration space \(\Omega_{\text{UV}}\).
    Define a coarse‑graining map: \[ \pi : \Omega_{\text{UV}} \to \Omega_{\text{IR}} \] Examples:

    Crucially:

  • many microstates map to the same macrostate

    That makes \(\pi\) many‑to‑one.


    From deterministic projection to stochastic evolution

    Suppose you have a probability distribution \(P_{\text{UV}}(\omega)\) over microstates.

    The induced distribution on macrostates is: \[ P_{\text{IR}}(X) = \sum_{\omega : \pi(\omega)=X} P_{\text{UV}}(\omega) \] This is exactly a stochastic map: \[ P_{\text{IR}} = \mathcal{E}(P_{\text{UV}}) \] where \(\mathcal{E}\) is a linear, positivity‑preserving, normalization‑preserving map.

    In probability theory, this is a Markov kernel.
    In quantum theory, the analogous object is a quantum channel.

    That’s all I meant.

    No metaphors. No hype.


    3. Why entropy must increase (or information must decrease)

    Now the thermodynamic heart of the matter.

    Shannon entropy under coarse‑graining

    Let \[ S[P] = -\sum_\omega P(\omega)\log P(\omega) \] Then for any coarse‑graining map \(\mathcal{E}\): \[ S[P_{\text{IR}}] \ge S[P_{\text{UV}}] \] Why?

    Because coarse‑graining:

    This is not a dynamical assumption. It’s a convexity theorem.


    Relative entropy makes this sharper

    For two distributions \(P\) and \(Q\): \[ D(P\|Q) = \sum P \log \frac{P}{Q} \] Under any stochastic map \(\mathcal{E}\): \[ D(P\|Q) \ge D(\mathcal{E}P \| \mathcal{E}Q) \] This is the data‑processing inequality.

    Translated:

    Coarse‑graining can only make states less distinguishable.
    This is the deep reason RG flows have a direction.


    4. RG flow = repeated coarse‑graining + reparameterization

    Now connect this to RG.

    Each RG step does:

    1. Coarse‑grain (stochastic map, entropy‑increasing)
    2. Re‑express the result in a standard form (rescaling)

    Step (1) is irreversible.
    Step (2) is bookkeeping.

    The irreversibility lives entirely in step (1).

    That’s why RG is a semigroup, not a group.


    5. RG monotones as entropy‑like quantities

    Now the punchline.

    RG monotones (\(c\), \(F\), \(a\), etc.) are not arbitrary miracles. They are:

    At a fixed point:

    That’s why monotones are constant at fixed points and decrease along flows.


    6. Why this is thermodynamics, not field‑theory magic

    Seen this way:

    This is why RG feels so fundamental: it’s not a trick of QFT, it’s a structural fact about inference under coarse observation.


    One sentence to carry forward

    Renormalization is thermodynamics applied to descriptions rather than states: coarse‑graining is a stochastic map, entropy is information lost, and RG flow is the irreversible march toward self‑similar macroscopic laws.
    If you want, next time we can formalize this in:

    But this is the conceptual core.

    Patrick:
    At this point I have only two questions:

    1. Before, you mentioned "a probability distribution \(P_{\text{UV}}(\omega)\) over microstates." So where does this initial distribution \(P_{\text{UV}}(\omega)\) come from? Is it estimated or calculated or what?
    2. Are fixed points inevitable in the renormalization process?

    Copilot:
    These are exactly the right two questions to ask at this stage. They go to the epistemic heart of RG, not the formal machinery. I’ll answer them in the same thermodynamic/stat‑mech language we’ve been using, and I’ll keep the two answers conceptually linked.


    1. Where does the UV probability distribution come from?

    Short answer: it is not arbitrary, and it is not usually “estimated” in a statistical sense . It is postulated from physical constraints, and in equilibrium settings it is essentially forced on you.

    The canonical situation: equilibrium statistical mechanics

    In classical statistical mechanics, the UV distribution is typically one of:

  • Microcanonical: uniform over microstates with fixed energy
  • Canonical: \[ P_{\text{UV}}(\omega) = \frac{1}{Z} e^{-\beta H(\omega)} \]
  • Grand canonical, etc.

    These are not guesses. They are the maximum‑entropy distributions consistent with macroscopic constraints (energy, particle number, etc.).

    So in equilibrium:

    This is why RG is usually formulated as a flow in Hamiltonian / action space, even though entropy arguments sit underneath.


    Out of equilibrium or effective descriptions

    If you are not in equilibrium, then yes:

  • \(P_{\text{UV}}\) may be an effective or inferred distribution
  • it may encode ignorance, noise, or preparation conditions

    But RG does not require equilibrium to be defined. What it requires is:

    The monotonicity statements (entropy increase, information loss) hold for any distribution.

    Equilibrium just gives you a clean, canonical starting point.


    The deeper point (this matters)

    RG does not depend sensitively on the exact UV distribution.

    Why?

    Because coarse‑graining washes out most microscopic distinctions. Universality means:

    This is why RG is powerful: it explains why macroscopic laws are robust to ignorance about the UV.


    2. Are fixed points inevitable?

    Short answer: no, but something like them is structurally unavoidable.

    Let me unpack that carefully.


    Fixed points are not guaranteed in every flow

    An RG flow can:

    So fixed points are not mathematically inevitable.


    But attractors are unavoidable in practice

    Here’s the thermodynamic/stat‑mech insight:

    RG is:

    That combination almost forces the existence of:

    Fixed points are the simplest such attractors.

    Even when the flow does not literally end at a fixed point, it often:

    This is why fixed points dominate physics even if they are not strictly inevitable.


    Why fixed points are special thermodynamically

    At a fixed point:

    In thermodynamic language:

    A fixed point is a stationary state of the coarse‑graining semigroup.
    That’s why they correspond to CFTs:


    The real inevitability: monotonic flow toward simplicity

    What is inevitable is this:

    Repeated coarse‑graining drives theories toward descriptions with fewer effective degrees of freedom.
    Fixed points are the cleanest endpoints of that process, but even when they are absent, the directionality remains.

    That directionality is the RG analogue of the Second Law.


    One unifying sentence

    That’s the thermodynamic spine of renormalization.

    If you ever want to push this further, the next natural step would be to formalize “loss of information across scales” using mutual information between blocks — that’s where RG, entropy, and inference really lock together.



    Afterword: