Watermarking technology creates and detects invisible “markings”, which can be
used to trace the origin, authenticity and usage of digital data.|
Traditional paper watermarks were invented in Italy more than seven hundred years ago. Papermakers initially used simple watermarks as a form of product branding. Eventually these marks became more ornate. Wealthy organizations and individuals could obtain paper with their own exclusive and unique watermarks. A document written on such paper would carry significantly greater authenticity to any recipient familiar with the mark. Finally, watermarking took on its most important role in the mid-seventeenth century, when it used to add security to paper currency.
Both digital and traditional paper-based watermarks are hard to notice, difficult to reproduce, and impossible to remove without destroying the medium they protect. But digital watermarks differ from traditional watermarks in a few critical ways:
- Creating a custom paper watermark is an elaborate and expensive process, but creating a digital watermark is fast, cheap and easy.
- Paper watermarks are usually visible, although special lighting may be required. Digital watermarks are hidden in the “noise” or background of visual and aural data and are impossible to detect without computer processing.
- Traditional watermarks are lost when a medium is copied, whereas digital watermarks are preserved. Some digital watermarks can even survive extreme data manipulation. It’s possible to watermark an image, print it, write all over the printout, scan it back into the computer and still detect the watermark in the scanned image!
What can you do with a digital watermark? Like traditional watermarks, they can guarantee the authenticity of information by identifying the creator or publisher. But digital watermarks are far more flexible than paper-based watermarks. Almost any type of information can be used to watermark digital media. The resilience of digital watermarks means that derivative works (samples of songs, collages) will still show the original watermark.
Digital watermarking, in theory, has many potential uses. But there’s a catch. It’s not possible or practical to watermark just any type of digital data. Images, audio and video are easy to watermark; but text, numeric data and application data are incredibly difficult. To understand why, we need to look at how digital watermarking works.
How It Works
Digital Watermarking software looks for noise in digital media and replaces it with useful information. A digital media file is nothing more than a large list of 0’s and 1’s. The watermarking software determines which of these 0’s and 1’s correspond to redundant or irrelevant details. For example, the software might identify details in an image that are too fine for the human eye to see and flag the corresponding 0’s and 1’s as irrelevant noise. Later the flagged 0’s and 1’s can be replaced by a digital watermark.
What exactly is noise?
Picture a typical two-hour meeting you may have during a workday. In our experience, many two hour business meeting can be summarized in just a few well-crafted sentences. Those few sentences are the “signal”; the rest of the meeting was the “noise”. To test this concept at your next meeting cover you ears and say, “blah blah blah…. I am not hearing any signal” for its duration. You may find you leave the meeting with the same knowledge as everyone else (but not your job).
The signal is the raw information you want. Noise is any other information that adds no additional value to the signal. In digital audio, noise is the hiss in the background. In still images and video it’s the graininess of the picture. Even text can have noise. Compare $1 to $1.00 to $0001.00. All three numbers represent the same thing, but the third representation is much “noisier” than the first.
The total amount of watermark data that can be hidden inside a media file depends entirely on how much noise is present in the original data. A very noisy image can either hold a large watermark, or many copies of a small watermark. Hiding a small watermark repeatedly gives much greater reliability. If the media gets chopped up or damaged, there is still a chance the watermark will be retained in some of the image “pieces”. However, if one large watermark is used instead of many small ones, splitting the image may break the watermark. A broken watermark cannot be successfully recovered or detected.
Images work quite well with digital watermarking because they tend to contain a lot fine details in which a watermark can be hidden. Audio and video media are also excellent candidates for digital watermarking. Ample background noise is always present in audio files, and is generally inaudible to the human ear. Still images frequently contain more detail than the eye can process. Video is really just a series of still images; any one of the video frames has ample noise for a watermark.
Text files, on the other hand, are notoriously low on digital noise. While one could argue that authors such as James Joyce use far more words than necessary, this noise is actually valued by readers and therefore is really considered to be a signal. While there are methods that can be used to generate or manipulate noise in a text document, they come with some severe drawbacks. The sidebar goes into a bit more detail.
Text Watermarking Techniques
How do you create noise in text? One simple solution is to introduce misspellings or spurious punctuation. This, unfortunately, is very easy to detect. More importantly, running the document through a spell checker will probably break any watermarks created in such a manner.
More effective systems work by changing the wording of text in a way that preserves meaning. Three simple syntax changes to “We saw a movie yesterday” leads to “A day ago we watched a movie” and six other equivalent variations. Over a million unique identifiers can be created with only twenty syntax changes. That’s not very hard to find in any document, considering that our five-word example sentence can support at least three syntax changes.
Changing the wording of a sentence is not a practical reality in many professions. You could imagine what it might do to a legal document. It is safe to assume that many scientists would not approve either. Furthermore, text watermarks are easy to detect if you can obtain two copies of the document with different identifiers. A quick comparison will show the differences between the two documents. The slightest alteration of these differences can destroy the watermark.
Software is another possible candidate for watermarking. It’s relatively simple to generate some noise in the binary program code by randomly inserting instructions that serve no function. Watermarks can also be placed inside the images that are part of a program’s user interface. Software companies can use these watermarks to link a pirated version of a program to the original purchaser.
Watermarking software makes it relatively easy to add digital watermarks to media files. The ease of adding a watermark belies the difficulty of successfully using watermarks. Here are some problems that you might encounter when deploying a watermark:
Ease of Destruction: Although watermarks are designed to survive manipulation of the source media, it is nonetheless possible to perform manipulations that irrecoverably break the watermark. Furthermore, the small number of watermark software vendors results in easily detectable watermark signature patterns. There are numerous effective techniques for identifying and disabling commercial watermarks in media.
Efficient Detection of Watermarks: Imagine you’re working for a stock photography company. Browsing the web one day, you come across an image that looks very familiar. Suspicious, you scan the image with your company’s watermarking software. Sure enough, it’s one of your images, and the site never purchased the right to use it! The watermark gives your legal team the ammunition it needs to force payment from the freeloaders.
This scenario makes watermarking sound incredibly useful. Unfortunately, the method of detection (accidental) is not very reproducible or reliable. Automated watermark search engines exist, but they have some significant limitations. For starters, the amount of digital media on the Internet is staggering. It could take hundreds of millions of dollars in equipment to effectively scan a significant amount of Internet data for watermarks. Then there’s traditional media -- scanning newspapers, magazines, TV broadcasts and films for watermarks requires a lot of manual work and therefore is rarely cost effective.
Stock photography, clip art, and other variants of digital artwork are ideal candidates for watermarking. Without watermarks, a visual artist can’t display their commercial images online without worrying that someone will just download and use their imagery without paying. By using a watermark search engine, our example scenario becomes a business saving strategy for these companies.
Oscar gets Wet
When events like the Academy Awards approach, film companies send out copies of their latest films for critical review. Often, one of these copies falls into the wrong hands and the film (which may not yet be in theaters) ends up available on the internet and for sale on the black market. Similar leaks occur when films are sent out to video/DVD duplication facilities. Initially the problem was ignorable: few computer users knew how to get the movies, and the quality was often terrible. But users became more sophisticated and the quality of the downloads/street bootlegs soon became indistinguishable from that of a DVD. Hollywood decided it was time to act.
Film companies are now digitally watermarking preview copies of films with identifiers that help them pinpoint the exact source of the "leak". There have already been a couple of cases where leaked versions of films have been traced back to celebrities and critics. This doesn't prove anything -- after all, someone could have intercepted the mail or "borrowed" the video without the celeb/critic's knowledge, but it does let the film company know roughly from where the leaks are coming. They may choose not to send preview copies to that critic or celebrity in the future, or arrange for a viewing at a safe location.
Limited Viable Media: The one type of content that traditional paper-based watermarks can protect is text. Ironically, text is the most difficult content to protect with digital watermarks. Current text watermarking techniques are rife with problems: they take too long to implement, they are too destructive to the original text, and too easy to remove.
Even if these problems weren’t present, there’s little demand for text watermarking. An application that would make it more viable is the E-book. In the short-term, however, people do not want to read books on a computer monitor. This may change in the future as display technologies become greatly enhanced and simulate paper. When that happens, text watermarking may receive more attention.
Tracking Logistics: In general, mass-market audio and video publishers aren’t worried about proving the authenticity of content. Nobody is going to copy a contemporary pop song and claim it belongs to him or her. Even when a song is sampled, the source of the sample is usually easy to recognize, thus derivative use is also not usually a concern.
On the other hand, audio and video publishers would benefit greatly if they could track the usage of their digital media through watermarks. But that means they need some way to uniquely mark each copy of their media. They also need to link the identifying mark with the purchaser’s identity. For content that is distributed on physical media, such as CDs, VHS tapes or DVDs, this leads to complex production logistics that are currently impractical in volume. Furthermore, the point of sale needs to capture the identifier and the personal details of the purchasers.
Making The Connection
Steganography: Digital watermarking is essentially a specialized version of steganography.
Outsourcing: There is no way to perform searches for your own digital watermarks. Because of the server hardware and software requirements, this is a process that must inevitably be outsourced.
The flexibility of digital watermarking technology may help publishers determine who made an unauthorized copy of a digital media file. One technique is to individually identify every copy of the digital media with a watermark. When the content is sold, the unique identifier is associated with the purchaser’s identity. Later, when the content is found circulating the net, the publisher can extract the unique identifier from the illegal copy. This, cross-referenced with the purchase history database, should tell the publisher the name of the person who originally purchased the copy that was pirated.
This technique could be logistically complicated, as described in the security considerations above. The logistics disappear, however, if the content is sold online. Just before the purchaser downloads the content, a unique identifier can be watermarked into their copy of the digital media file. The server, acting as the point of sale, can link the identifier with the personal information obtained when the payment was made. Viola! If the purchaser attempts to duplicate and illegally distribute the media, the watermark will identify them as the culprit.
Note that this doesn’t automatically prove that this was the person responsible for the piracy. What if the person resold, lost or returned the media? It’s easy to come up with scenarios where the original consumer is not at fault.
Digital Watermarking has a lot of potential. It’s a great solution. There are just too few applicable problems. Digital watermarking has yet to find a mainstream use beyond protecting stock media.
The above information is the start of a chapter in "Network Security Illustrated," published by McGraw-Hill and available from amazon.com, as well as your local bookstore. The book goes into much greater depth on this topic. To learn more about the book and what it covers, click here.
Below, you'll find links to online resources that supplement this portion of the book.