Why Your Images Are Killing Your Bandwidth

Let us start with an uncomfortable truth. The modern web is fat. Not pleasantly curvy, not "a few extra kilobytes around the holidays" fat. We are talking morbidly, catastrophically, "how did we let this happen" bloated. And the single biggest contributor to that bloat? Images.

According to the HTTP Archive, which tracks millions of the most visited websites on the planet, images account for nearly half of all data transferred on an average web page. Think about that for a second. Not JavaScript. Not video. Not your fancy web fonts. Images. Those silent rectangles of color sitting in your hero banners, product grids, and blog posts are quietly devouring bandwidth like it is an all you can eat buffet.

The average web page in 2025 weighs in at well over 2 megabytes. A decade ago, that number was closer to 1 megabyte. We have literally doubled the size of the web, and images have been the primary driver of that inflation. For users on fast fiber connections in major cities, this might feel like a non issue. For the billions of people accessing the web on spotty mobile connections in the rest of the world, it is the difference between a page that loads and a page that gets abandoned.

The Core Web Vitals Problem

Google, for all its faults, actually tried to do something about this. In 2020, the company introduced Core Web Vitals, a set of metrics that measure real world user experience on the web. The one that keeps web developers up at night is Largest Contentful Paint, or LCP. LCP measures how long it takes for the biggest visible element on a page to fully render. And guess what the largest visible element on most pages is? You guessed it. An image.

If your LCP clocks in over 2.5 seconds, Google considers your page to have a "poor" user experience. That does not just hurt your users. It hurts your search rankings. Google has been increasingly explicit about the fact that page speed is a ranking factor. So those massive, unoptimized JPEG hero images you have been slapping on every landing page? They are not just slow. They are actively burying you in search results.

The JPEG and PNG Legacy Bottleneck

Here is where the story gets interesting, because the tools most of the web still relies on were never designed for this era.

JPEG was created in 1992. Let that sink in. The image format that still powers the vast majority of photographs on the internet was designed in a world where a 14.4 kbps modem was considered fast and a 640x480 monitor was high resolution. JPEG was revolutionary for its time. It introduced lossy compression to the masses, allowing photographs to be compressed to a fraction of their raw size by throwing away visual information the human eye would (theoretically) never miss. But JPEG was designed for 8 bit color, standard dynamic range, and the display technology of the early 1990s.

PNG came along in 1996 as a lossless alternative with transparency support, born partly out of a licensing dispute over the GIF format (yes, image format drama has been going on for decades). PNG is great for graphics, logos, and anything with sharp edges or text. But it produces enormous files for photographs because it refuses to throw away any data at all.

Here is the fundamental problem. Display technology has leaped forward by generations. We now have screens capable of billions of colors, HDR content, wide color gamuts, and resolutions that would have seemed like science fiction in the 1990s. Meanwhile, the dominant image formats on the web are still fundamentally limited to the capabilities of that era. JPEG cannot do transparency. It cannot do animation. It is locked at 8 bits of color depth. And its compression algorithm, while groundbreaking thirty years ago, leaves enormous amounts of efficiency on the table compared to what modern codecs can achieve.

The web needed something new. Something that could compress images dramatically smaller, support modern display features like HDR and transparency, handle animation without the absurd file sizes of GIF, and ideally not come saddled with a mountain of patent licensing fees that would make it prohibitively expensive to use.

That something is AVIF. And to understand where it came from, we need to talk about a very expensive war that has been raging behind the scenes of the tech industry for over a decade.

The Origin Story: AOMedia and the "Royalty-Free" Revolution

The Patent Wars No One Talks About

To understand why AVIF exists, you first need to understand the ugly, expensive, behind the scenes battle that has been quietly shaping the media technology industry for decades. It is a story about patents, power, and the question of who gets to control how the world encodes and decodes visual information.

Every time you watch a video on YouTube, stream a movie on Netflix, or snap a photo on your iPhone, a codec is doing the heavy lifting. A codec is just a set of mathematical rules for compressing and decompressing media. The problem is that developing a good codec takes years of research, and the companies and institutions that do that research love to patent every technique they invent. That means if you want to use the best compression technology available, you often have to pay someone for the privilege.

For a long time, the video world was dominated by the H.264 codec (also known as AVC). H.264 became the backbone of internet video, powering everything from YouTube to Blu ray discs. Its licensing was managed by a patent pool called MPEG LA, and while the fees were not exactly cheap, they were at least predictable. Most companies grumbled, paid the bill, and moved on.

Then came the successor: H.265, also known as HEVC (High Efficiency Video Coding). HEVC was technically brilliant. It could compress video about 50% better than H.264. But its patent licensing situation was an absolute catastrophe. Instead of one patent pool, HEVC ended up with multiple competing patent pools and a swarm of individual patent holders all demanding separate royalties. Nobody could even figure out how much it would actually cost to license HEVC for a product at scale. For companies like Google, Netflix, and Amazon that serve billions of video streams every single day, the potential bill was terrifying.

And here is where images enter the picture. Apple, which had adopted HEVC for video on its devices, also introduced HEIC (High Efficiency Image Container) as the default photo format on iPhones starting with iOS 11 in 2017. HEIC is essentially a still image encoded with the HEVC codec and wrapped in a HEIF container. It produced beautiful, compact photos. But because it was built on HEVC, it carried all the same patent baggage. You could not just build a free HEIC viewer or editor without potentially getting hit with licensing demands. The web, which had always thrived on open and freely implementable standards, wanted nothing to do with it.

The Birth of the Alliance for Open Media

By the mid 2010s, several major tech companies had independently reached the same conclusion: the codec patent situation was unsustainable, and the industry needed a royalty free alternative built from the ground up.

Google had already been working on its own open video codecs through the VP8 and VP9 projects (VP9 is what YouTube uses for a huge chunk of its video today). Mozilla had been championing an experimental codec called Daala. Cisco had contributed an open codec called Thor. Each of these efforts was promising but incomplete on its own.

So in 2015, something remarkable happened. Google, Mozilla, Cisco, Amazon, Netflix, Microsoft, and Intel came together to form the Alliance for Open Media, commonly known as AOMedia. The mission was simple but audacious: pool their collective research, engineering talent, and patent portfolios to create a single, next generation video codec that would be completely open source and permanently royalty free. No licensing fees. No patent pools. No lawyers sending invoices. Ever.

Apple, which had initially sat on the sidelines, eventually joined AOMedia in 2018, bringing the alliance to a level of industry support that was essentially unprecedented. When the companies that collectively control the vast majority of the world's browsers, operating systems, streaming platforms, and mobile devices all agree to back a single open standard, that standard is going to win. It is not a question of if, but when.

The result of that collaboration was AV1, which was finalized as a video codec specification in 2018. AV1 delivered on the promise. It matched or exceeded HEVC in compression efficiency while being completely free to use for anyone on the planet.

From Video Codec to Image Format

But the AOMedia team realized something important almost immediately. If AV1 was a world class compression engine for video, and a video is just a sequence of images, then a single frame of AV1 video was by definition a world class compressed image. Why not formalize that into a proper image format?

And so AVIF was born. The specification was straightforward in concept: take one frame of AV1 compressed data, wrap it in the same HEIF container that Apple uses for HEIC, and publish the whole thing as an open standard. Netflix, ever the compression obsessive, was the first major company to publicly champion the format, releasing sample AVIF images in December 2018 to demonstrate that the format could deliver stunning quality at remarkably small file sizes. The official AVIF 1.0.0 specification was finalized in February 2019.

In a sense, AVIF is the direct answer to HEIC. It uses the same container. It serves the same purpose. But where HEIC is built on top of the patent minefield of HEVC, AVIF is built on top of the completely open AV1 codec. Same packaging, radically different politics. And in the world of web standards, politics matter just as much as pixels.

The Science of "AV1": How It Actually Works

Now we get to the nerdy stuff. If you want to understand why AVIF images are so remarkably small without looking like garbage, you need to understand what the AV1 codec is actually doing to your pixels under the hood. Fair warning: this section gets technical. But I promise it is worth it, because the engineering here is genuinely clever.

At its core, all image compression is trying to solve the same problem: raw pixel data is absurdly wasteful. A single uncompressed 4K photograph at 8 bit color depth takes up roughly 24 megabytes of storage. Nobody wants to download 24 megabytes every time they look at a photo on the web. So the codec's job is to find patterns, redundancies, and perceptual shortcuts that let it describe the same image using a fraction of the data.

AV1 does this better than almost anything else available today. Here is how.

Intra Frame Prediction: The Secret Sauce

When AV1 compresses a video, it has a powerful trick for most frames: it can look at the frame before and the frame after and only encode the differences between them. That is called inter frame prediction, and it is why video files are not just millions of individual photographs stitched together.

But for still images, there is no "previous frame" to reference. You only have one frame, and you have to compress it entirely on its own. This is called intra frame coding, and it is where AV1 truly shines for the AVIF use case.

The basic idea behind intra prediction is deceptively simple. Instead of encoding every single pixel in a block from scratch, the encoder looks at the pixels that have already been decoded in neighboring blocks (above, to the left, and diagonally adjacent) and makes a prediction about what the current block probably looks like based on those neighbors. Then, instead of storing the actual pixel values, it only stores the difference between its prediction and reality. If the prediction is good, that difference is tiny, which means it compresses down to almost nothing.

Here is where AV1 pulls ahead of older codecs like JPEG. JPEG has a relatively primitive prediction model. AV1, on the other hand, supports a staggering number of directional intra prediction modes. We are talking over 50 angular directions. Imagine a pixel block that contains a diagonal edge, say the roofline of a building against a blue sky. An older codec might struggle to predict that angle efficiently and end up wasting bits encoding the mismatch. AV1 can pick from dozens of precisely angled prediction directions to find one that closely matches that exact diagonal, resulting in a much smaller residual (the leftover error that actually needs to be encoded).

On top of directional prediction, AV1 also includes several non directional modes. There is a smooth mode that generates gradual gradients, perfect for areas like out of focus backgrounds or clear skies. There is a "paeth" predictor that picks the neighboring pixel whose value is closest to a mathematically computed estimate. And there is a chroma from luma mode (often abbreviated CfL) that exploits the correlation between brightness and color information. If the encoder already knows the brightness (luma) of a region, it can use that to make a smart guess about the color (chroma) values, often getting remarkably close to the real thing without encoding the color channel from scratch.

The cumulative effect of all these prediction tools is that AV1 starts with a much better "first guess" of what each block of pixels looks like. And a better first guess means less residual error, which means fewer bits needed to describe the image, which means a smaller file.

Block Partitioning: Thinking in Flexible Shapes

Older codecs, especially JPEG, divide an image into rigid, fixed size blocks. JPEG uses 8x8 pixel blocks, no exceptions. Every part of the image, whether it is a complex area full of intricate texture or a vast expanse of flat blue sky, gets the same 8x8 treatment. This is wildly inefficient. You are wasting bits encoding tiny blocks in smooth areas where a single large block would do, and you are not giving yourself enough resolution in detailed areas where smaller blocks would capture the complexity better.

AV1 takes a fundamentally different approach. It uses a recursive block partitioning system based on a structure called a superblock. Each superblock starts at either 128x128 or 64x64 pixels, and the encoder can then subdivide it into smaller blocks using a variety of split patterns. It can split a block in half horizontally, vertically, or into four equal quadrants. It can even do asymmetric splits, carving a block into one narrow piece and one wide piece, or one short piece and one tall piece.

The result is that AV1 can adapt its block structure to match the actual content of the image. A large area of smooth sky? The encoder keeps it as one big block and describes it with just a few values. A highly detailed area with fine textures, edges, and color transitions? The encoder recursively subdivides it into smaller and smaller blocks, giving itself the resolution it needs to capture that complexity.

This content adaptive block partitioning is one of the major reasons AV1 (and by extension AVIF) handles photographic images so well. It puts the bits where they are needed and spends almost nothing on the parts of the image where nothing interesting is happening.

Transform Coding: From Pixels to Frequencies

Once AV1 has its prediction residuals (the differences between what it predicted and what the actual pixels look like), it needs to compress those residuals further. It does this using transform coding, which converts the spatial pixel data into frequency data.

If that sounds abstract, think of it this way. Imagine a block of pixels that is mostly one color with a subtle gradient across it. In the spatial domain (raw pixel values), you might need 64 separate numbers to describe a block of 64 pixels. But in the frequency domain, that same block can be described as "one dominant color plus a tiny amount of gradual change," which might only take two or three numbers. Transform coding exploits the fact that most image data, when viewed as frequencies, is heavily concentrated in just a few values.

AV1 supports multiple transform types, including the traditional DCT (Discrete Cosine Transform) that JPEG also uses, as well as the ADST (Asymmetric Discrete Sine Transform) and identity transforms. Crucially, it can pick different transform types for different blocks and can apply transforms in variable sizes from 4x4 all the way up to 64x64. This flexibility means AV1 can tailor its mathematical approach to the specific characteristics of each region of the image, squeezing out additional efficiency that a one size fits all approach would miss.

After the transform step, the resulting frequency coefficients are quantized (the small, perceptually insignificant values are rounded down to zero or near zero, which is where the "lossy" part of lossy compression happens) and then entropy coded using a sophisticated arithmetic coder that assigns shorter binary codes to more common values and longer codes to rare ones. The entire pipeline, from prediction to partitioning to transformation to quantization to entropy coding, is designed to strip the image down to only the information that actually matters to the human eye.

Film Grain Synthesis: The Magic Trick

This is perhaps the most fascinating feature AV1 brings to the image compression world, and it is one that most people have never heard of.

Consider a photograph taken in low light or with high ISO settings. It will have a layer of visual noise or "grain" across the entire image. This grain is essentially random or semi random speckle that varies from pixel to pixel. From a compression standpoint, grain is a nightmare. It is high frequency, high entropy data that resists prediction, resists transformation, and eats up an enormous number of bits to encode faithfully.

Traditional codecs have two options when faced with grain. Option one: spend a huge number of bits trying to reproduce the grain accurately, resulting in a massive file. Option two: smooth the grain away during compression, resulting in a small file but an image that looks artificially plastic and over processed.

AV1 introduces a third option: film grain synthesis. During encoding, the AV1 encoder analyzes the grain pattern in the image, strips it out, compresses the clean, smooth underlying image (which compresses beautifully), and then embeds a compact mathematical model of the grain's statistical properties as metadata alongside the compressed image. This model describes things like the average grain size, intensity, and correlation patterns. It is tiny, often just a few hundred bytes.

When the image is decoded for display, the decoder reads that mathematical model and procedurally regenerates a grain pattern that is statistically identical to the original. It paints the synthetic grain back on top of the smooth decoded image in real time. The viewer sees an image that looks authentically grainy and textured, exactly as the photographer intended. But the file is dramatically smaller because the encoder never had to waste bits encoding the actual grain pixel by pixel.

This technique was originally developed for cinema and streaming video (Netflix was a major driver of its development, since film grain in movies was consuming enormous amounts of bandwidth). But it works just as well for still photography, and it is one of the reasons AVIF can produce strikingly natural looking images at file sizes that seem almost impossibly small.

The Container: ISOBMFF and HEIF Compatibility

We have spent a lot of time talking about AV1, the compression engine that makes AVIF images so small. But a codec on its own is not a file format. A codec is just an algorithm. It takes pixels in and spits compressed data out. That compressed data needs to live somewhere. It needs a structure, an organized wrapper that tells software where the image data begins and ends, what color space it uses, whether there is transparency information, how large the image is, and a hundred other pieces of metadata that make the file actually usable.

That wrapper is called a container, and understanding the container AVIF uses is key to understanding why the format fits so neatly into the modern ecosystem.

The Difference Between a Wrapper and a Codec

This distinction trips up a lot of people, so let us be crystal clear about it.

A codec is the engine. It handles the actual mathematical work of compressing and decompressing pixel data. AV1 is a codec. HEVC is a codec. The old JPEG compression algorithm is technically a codec, even if people do not usually use that word for it.

A container is the box the engine sits inside. It is a standardized file structure that organizes the compressed data along with all the associated metadata, thumbnails, color profiles, and anything else the file needs to carry. The container does not do any compression itself. It just provides an orderly system for storing and retrieving the compressed payload.

Think of it like shipping a product. The codec is the product. The container is the cardboard box, the packing slip, the shipping label, and the barcode that tells the warehouse system what is inside. You need both. A codec without a container is just a raw blob of data that no software knows how to interpret. A container without a codec is an empty box with a nice label on it.

In the video world, this separation is well understood. An MP4 file, for instance, is a container. Inside that container, the actual video might be compressed with H.264, H.265, AV1, or any number of other codecs. The .mp4 extension tells you about the box, not the engine inside it. The same container format can hold completely different codecs.

AVIF works on this exact same principle. The "AV1" part is the codec. The container it sits inside is based on two interrelated standards: ISOBMFF and HEIF.

ISOBMFF: The Universal Shipping Box

ISOBMFF stands for ISO Base Media File Format. If you have never heard of it, you have definitely used it. ISOBMFF is the foundational container format that underpins MP4 video files, 3GP mobile video files, and a huge number of other media formats. It was originally standardized by the International Organization for Standardization (ISO) and the International Electrotechnical Commission (IEC), and it has been refined and extended over decades.

At its core, ISOBMFF organizes a file into a series of nested "boxes" (also called "atoms" in older documentation). Each box has a type identifier and a size, and boxes can contain other boxes inside them. There are boxes for metadata, boxes for media data, boxes for timing information, boxes for color profiles, and so on. The format is deliberately extensible, meaning new types of boxes can be defined without breaking the overall structure.

The beauty of ISOBMFF is its universality. Because so many media formats already use it, there is a massive ecosystem of software, libraries, and hardware that knows how to parse and navigate its box structure. When AVIF adopted ISOBMFF as its foundation, it inherited all of that existing infrastructure essentially for free.

HEIF: The Image Specific Layer

ISOBMFF was originally designed for video and time based media. Using it to store still images required an additional layer of specification that defined how to represent image specific concepts within the ISOBMFF box structure. That layer is called HEIF, the High Efficiency Image File Format.

HEIF was developed by the Moving Picture Experts Group (MPEG) and standardized as ISO/IEC 23008 12. It is not a codec and it is not a standalone file format. It is a specification that describes how to store still images, image collections, image sequences (animations), thumbnails, depth maps, alpha channels, and other image specific data inside an ISOBMFF container.

HEIF is deliberately codec agnostic. It defines the structure and the rules, but it does not care what compression algorithm is used for the actual pixel data. This is exactly the point. HEIF was designed so that different codecs could be plugged into the same container framework.

When Apple introduced HEIC as the default photo format on iPhones, what they actually did was take the HEIF container specification and plug the HEVC (H.265) codec into it. HEIC is not really a format in its own right. It is HEIF plus HEVC. The .heic file extension is essentially a branding choice.

AVIF: The Open Source Twin

And this is where the elegance of AVIF's design becomes clear. AVIF takes the exact same HEIF container specification, follows the exact same structural rules, uses the exact same box types and metadata organization, but swaps out the proprietary HEVC codec for the open, royalty free AV1 codec.

AVIF and HEIC are, architecturally, siblings. They share the same container DNA. They organize their data in the same way. They support the same types of features at the container level: multiple images in a single file, alpha channels stored as separate image items, thumbnail images for quick previews, grid derivations for tiling large images, and EXIF/XMP metadata embedded in standardized boxes.

The AVIF specification itself is quite explicit about this relationship. It states that an AVIF file is designed to be a "conformant HEIF file" and follows the recommendations given in the HEIF specification's annex on defining new image formats. It even reuses the syntax and semantics defined in the AV1 ISOBMFF mapping specification, which describes how AV1 compressed data should be stored inside ISOBMFF containers.

This is not just an academic detail. It has real, practical consequences.

Why the Container Matters in Practice

First, it means that any software or hardware that already understands the HEIF container structure can be extended to support AVIF with relatively modest effort. The engineers do not have to build a new file parser from scratch. They just need to add an AV1 decoder to handle the payload inside the familiar HEIF box structure. This is one of the reasons AVIF adoption has been comparatively rapid across operating systems, browsers, and image editors. The container was already understood. Only the codec was new.

Second, it means AVIF inherits all the sophisticated features that HEIF provides at the container level. Want to store a primary image alongside an alpha transparency plane as a separate, independently compressed image item? HEIF has a mechanism for that. Want to assemble a massive image from a grid of smaller, independently decodable tiles? HEIF has a mechanism for that too, called grid derivation, which is how AVIF can represent images larger than the resolution limits of its baseline profile without requiring a decoder that supports enormous single frame sizes.

Third, it creates a clean separation of concerns that makes the format future proof. If a better codec than AV1 comes along in ten years, the HEIF container can accommodate it without reinventing the entire file format. The box structure, the metadata organization, the way alpha channels and image sequences are represented: all of that stays the same. Only the codec payload changes. This modularity is a significant architectural advantage over monolithic formats like JPEG, where the compression algorithm and the file structure are deeply intertwined and essentially impossible to upgrade independently.

The HEIC Connection: Same Box, Different Engine

For Apple users, this relationship has a very tangible implication. Your iPhone has been shooting photos in HEIC (HEIF plus HEVC) for years. When Apple added AVIF support in iOS 16 and macOS Ventura, it was not adopting some alien technology. It was adding support for a format that uses the same container its own camera format already uses, just with a different (and open) codec inside.

This shared lineage also explains why Apple's transition to supporting AVIF was smoother than you might expect for a company that had so heavily invested in HEVC. The infrastructure for reading and writing HEIF containers was already deeply embedded in Apple's operating systems. Supporting AVIF was largely a matter of integrating an AV1 decoder, not rebuilding the entire image pipeline.

In a sense, AVIF's choice of container was as strategically important as its choice of codec. By building on HEIF and ISOBMFF rather than inventing something new, the Alliance for Open Media ensured that AVIF would slot into the existing media ecosystem with minimal friction. The format did not ask the industry to learn a new container language. It just asked it to accept a new, open codec speaking a language it already understood.

Technical Deep Dive: The Capability Set

So far we have covered why AVIF exists, how its compression engine works, and how its container holds everything together. Now it is time to talk about what AVIF can actually do. Because file size reduction, as impressive as it is, only tells part of the story. AVIF was designed from the ground up to support a range of capabilities that older formats simply cannot match. These capabilities are not theoretical. They are the features that make AVIF genuinely relevant for the next decade of visual media on the web and beyond.

Color Spaces: Seeing More of the World

To understand why AVIF's color space support matters, you need to understand a fundamental limitation that has constrained digital images for decades.

Every digital image exists within a color space, which is essentially a defined boundary that describes the range of colors the image is capable of representing. For nearly the entire history of the consumer web, that boundary has been sRGB. The sRGB color space was standardized in 1996 by HP and Microsoft, and it was designed to match the capabilities of the CRT monitors that were common at the time. It covers a relatively narrow slice of the colors the human eye can perceive, roughly 35% of the visible spectrum.

For a long time, that was fine. Monitors could not display anything beyond sRGB anyway, so there was no point encoding colors that no screen could reproduce. But display technology has advanced dramatically. Modern smartphones, tablets, laptops, and desktop monitors increasingly support Wide Color Gamut (WCG) displays that can reproduce colors well outside the sRGB boundary. Apple's P3 displays, which have shipped on every iPhone, iPad, and Mac for years now, cover roughly 25% more color space than sRGB. Professional reference monitors used in film and photography cover even more.

JPEG is locked to sRGB in practice. Technically you can embed an ICC profile in a JPEG to indicate a wider color space, but the format's 8 bit depth means you do not have enough tonal precision to take advantage of wider gamuts without introducing visible banding artifacts. It is like having a highway with ten lanes but only enough cars to fill three of them.

AVIF, by contrast, was built for wide color. It natively supports the BT.709 color primaries that define sRGB, but also the BT.2020 color primaries that define the wide color gamut used in modern HDR content. BT.2020 covers roughly 75% of the visible spectrum, more than double what sRGB can represent. This means an AVIF image can encode rich, saturated reds, greens, and blues that a JPEG literally cannot describe.

HDR: High Dynamic Range for Still Images

Color gamut describes the range of colors. Dynamic range describes the range of brightness, from the deepest shadows to the brightest highlights. And this is where AVIF gets really interesting for anyone who cares about image quality.

Standard Dynamic Range (SDR), which is what JPEG and the traditional web are limited to, can represent a brightness range of roughly 100 nits (a unit of luminance). That was perfectly adequate for the LCD panels of the 2000s and 2010s. But modern HDR displays can hit peak brightness levels of 1,000, 1,600, or even 4,000 nits on high end OLED panels. The sun glinting off a car's chrome bumper, the dazzling interior of a neon lit nightclub, the incandescent glow of molten metal: these are the kinds of visual moments that HDR can reproduce with startling realism, and that SDR content simply clips to flat, featureless white.

AVIF supports HDR natively through the transfer functions defined in the BT.2100 specification. It can use PQ (Perceptual Quantizer), which was developed by Dolby and maps brightness levels to an absolute luminance scale up to 10,000 nits. It can also use HLG (Hybrid Log Gamma), which was developed by the BBC and NHK for broadcast and is designed to be backwards compatible with SDR displays.

This is not an incremental improvement. It is a generational leap. For the first time, a mainstream, royalty free web image format can deliver the same HDR experience that streaming services like Netflix and Apple TV+ deliver for video content. A photographer shooting an HDR image can encode it as AVIF and know that viewers with HDR capable screens will see the full dynamic range the camera captured, while viewers on SDR screens will see a perfectly acceptable standard range version.

AVIF even supports the HDR gain map approach, which embeds both an SDR base image and metadata describing how to "boost" the image for HDR displays. This provides seamless backwards compatibility: SDR devices see a normal image, while HDR devices see the enhanced version, all from a single file. As of this writing, encoder support for gain maps is still emerging, but the specification supports it, and it represents a genuinely elegant solution to the SDR/HDR compatibility problem.

Color Space Signaling: Speaking the Right Language

Having wide color and HDR capability is useless if the software displaying the image does not know how to interpret the color data correctly. An image encoded in BT.2020 with PQ transfer function will look completely wrong if the viewer's software assumes it is sRGB. The colors will be off, the brightness will be wrong, and the entire image will appear washed out or oversaturated.

AVIF handles this through two complementary signaling mechanisms.

The first is CICP (Coding Independent Code Points), defined in ITU T H.273 and ISO/IEC 23091 2. CICP is a compact, standardized system that uses a small set of numerical codes to describe three properties of the color data: the color primaries (what the "red," "green," and "blue" anchors are), the transfer characteristics (how brightness is encoded, whether linear, sRGB gamma, PQ, or HLG), and the matrix coefficients (how luma and chroma are derived from the primary colors). CICP is lightweight, unambiguous, and well supported by modern decoders.

The second mechanism is the traditional ICC profile, which is a more detailed and flexible (but also much larger) description of the color space. ICC profiles have been the standard for color management in professional photography and print for decades. AVIF supports embedding ICC profiles for situations where CICP does not provide sufficient granularity, such as custom or specialized color spaces used in professional workflows.

Between CICP and ICC profiles, AVIF has a robust and future proof system for ensuring that colors are reproduced correctly across every device and application in the chain, from the camera sensor to the editor's monitor to the end user's phone screen.

Bit Depth: Why 8 Bits Is No Longer Enough

This brings us to bit depth, which is one of the most practically important advantages AVIF holds over JPEG and WebP.

Bit depth describes how many distinct tonal values each color channel can represent. An 8 bit image has 256 levels per channel. A 10 bit image has 1,024 levels per channel. A 12 bit image has 4,096 levels per channel. That might sound like an abstract numbers game, but the visual difference is real and significant.

Consider a photograph of a sunset sky. The gradient from deep orange near the horizon to dark blue at the zenith involves thousands of subtle tonal transitions. In an 8 bit image, there are only 256 steps to represent that entire gradient per color channel. When the transitions are smooth and gradual, 256 steps is often not enough, and you get "banding," visible staircase like steps between tones where the gradient should be seamless. Every photographer who has ever tried to heavily edit or grade a JPEG has seen banding. It is one of the most common and most frustrating artifacts in 8 bit imagery.

AVIF supports 8 bit, 10 bit, and 12 bit color depths. At 10 bits, you have four times as many tonal steps per channel (1,024 versus 256). At 12 bits, you have sixteen times as many (4,096 versus 256). This additional precision virtually eliminates banding in gradients, provides dramatically more headroom for post processing and color grading, and is essential for HDR content where the wider brightness range demands finer tonal resolution to avoid visible stepping.

For professional photographers, this is a significant deal. Shooting in raw and editing in 16 bit color spaces has been standard practice for years, but the moment you exported to JPEG for delivery or web publishing, you were forced to crush all that tonal richness down to 8 bits. AVIF lets you export at 10 or 12 bits, preserving far more of the tonal quality from your original edit. The difference is especially visible in images with large areas of smooth tone: skies, skin, studio backdrops, architectural surfaces, and anything shot with shallow depth of field where the bokeh (the out of focus background blur) contains gentle gradients.

WebP, AVIF's most direct predecessor from Google, is also limited to 8 bit color depth. This alone makes AVIF a meaningful upgrade for any use case where color precision matters.

Chroma Subsampling: Balancing Color and Efficiency

AVIF supports three chroma subsampling modes: 4:2:0, 4:2:2, and 4:4:4. This is a technical detail that has significant practical implications, so it is worth explaining briefly.

The human visual system is far more sensitive to changes in brightness (luminance) than to changes in color (chrominance). Chroma subsampling exploits this biological quirk by storing color information at a lower resolution than brightness information. In 4:2:0 subsampling, the color channels are stored at half the horizontal and half the vertical resolution of the brightness channel. This cuts the color data to one quarter of its original size with minimal perceptible quality loss in most photographic content.

4:2:0 is the standard for web delivery and casual viewing. It is what JPEG uses, what most streaming video uses, and what most AVIF images on the web will use. It provides the best balance between quality and file size.

4:2:2 stores color at half horizontal resolution but full vertical resolution. It is a middle ground often used in broadcast video production.

4:4:4 stores color at full resolution with no subsampling at all. Every pixel gets its own complete color value. This is essential for content with fine color detail, such as graphics with thin colored lines, screenshots of user interfaces, text overlays, or any situation where color accuracy at the pixel level matters. It is also important for professional photography workflows where any color information loss is unacceptable.

JPEG is locked to 4:2:0 in virtually all real world usage. WebP also only supports 4:2:0 for lossy compression. AVIF's support for all three subsampling modes gives it flexibility that neither of its predecessors can match.

Alpha Transparency: The End of the PNG Tax

One of the most requested features in web image formats has always been transparency, the ability to have pixels that are partially or fully see through so that the background of the page or the layer beneath can show through. JPEG cannot do transparency. Period. If you need a transparent background, you have been forced to use PNG, which supports a full alpha channel but produces enormous file sizes for photographic content because it is lossless. Or you could use WebP, which added transparency support but is still constrained by its older compression technology and 8 bit color depth.

AVIF supports a full alpha channel with the same flexibility and quality as its color channels. The alpha plane can be compressed independently from the color data, using its own compression settings. It can be lossy or lossless, and it can be stored at a different bit depth or subsampling level than the color image if desired.

In practical terms, this means you can have a photographic image with complex transparency (think a product shot of a pair of sunglasses with semi transparent lenses, or a portrait with a blurred, partially transparent background edge) at a fraction of the file size that PNG would require. The combination of AV1's efficient compression and native alpha support makes AVIF a compelling replacement for PNG in virtually every scenario where transparency is required alongside photographic or complex visual content.

Image Sequences: The GIF Killer

Finally, let us talk about animation.

GIF, the format that powers the internet's endless supply of reaction animations and looping memes, was created in 1987. It supports a maximum of 256 colors per frame. It has no real compression to speak of by modern standards. And yet it persists, in large part because for a long time there was no universally supported web format that could do short, looping animations with transparency.

AVIF can store image sequences, which are essentially short animations consisting of multiple frames, each compressed with AV1. The quality and compression advantages are staggering. A GIF animation that weighs 5 megabytes might compress to a few hundred kilobytes as an AVIF sequence, with vastly better color reproduction (millions of colors instead of 256), smooth alpha transparency instead of GIF's harsh binary transparency (a pixel is either fully transparent or fully opaque in GIF), and none of the dithering artifacts that plague GIF images when they try to represent photographic content with their pathetically limited color palette.

AVIF image sequences support variable frame rates, independent frame durations, and all the same color and HDR features available to still images. They are, in every technical dimension, a superior replacement for animated GIFs.

The main barrier to AVIF sequences fully replacing GIF has been ecosystem support. While all major browsers support still AVIF images, support for animated AVIF sequences has lagged slightly behind. But it is catching up steadily, and platforms like Telegram and others have already adopted animated AVIF. The writing is on the wall for GIF. It just might take a few more years for the rest of the ecosystem to finish reading it.

Keep Reading