Digital Images for the Web: Introduction and Tips
Digital Images for the Web: Introduction and Tips
Rev (D) 20 June 2009
© 2002 - 2009 by Joe Roberts
The most common image file formats in use on the web today include JPEG (Joint Photographers Expert Group), BMP (Bitmap), TIFF (Tagged Image File Format) and GIF (Graphics Image Format). There are numerous other image file formats, however many of the less common formats are specific to a particular industry or application. Each file format has advantages and disadvantages; use of a particular file format should take into consideration what the final image will be used for. This article will concentrate mostly on the JPEG format, as it is the most commonly used format for web images (and perhaps the least understood). A brief description of each file format appears below:
JPEG: Excellent choice for photographs that are to be e-mailed or posted on the web for downloading by others. Advantages: JPEG format results in relatively small file sizes, and most any browser and computer application can deal with JPEG images readily. Disadvantages: JPEG is a "loss compression" format, which basically means that you give up image quality as the compression factor increases (smaller file size = less image quality). There are several subtleties about the JPEG format that the user should be aware of (more on this later). You can identify a JPEG file by looking at the filename: the last three letters will (usually) be "jpg" (example filename: rocket.jpg); you may also see "jpeg" as the file extension too.
BMP: Bitmap is a "brute force" format, meaning that full image fidelity is maintained, however file sizes will be large. BMP images support 24 bit color (more on what this means later). Many browsers and most applications can handle BPM files. BMP is not the best choice of format for files to be e-mailed or posted on the web (because of the large file sizes that typically exist even for small images). BMP is not a bad choice of file format for archival purposes because it preserves full image quality. You can identify a BMP file by looking at the filename: the last three letters will be "bmp" (example filename: rocket.bmp).
TIFF: TIFF is similar to BMP in that it maintains full image fidelity (however TIFF allows for many other options). TIFF can support images with more than 8 bits per channel). TIFF is an excellent choice for critical images that are to be edited over and over. Like BMP files, TIFF files can be quite large, and therefore TIFF is not a good choice for files that are to be e-mailed or posted on the web. Most browsers and many applications can deal with TIFF files. You can identify a TIFF file by looking at the filename: the last three letters will (usually) be "tif" (example filename: rocket.tif).
GIF: GIF files should be used for images that are more properly termed "graphics". For example, GIF is used for things like charts, web page "buttons", etc. GIF only allows 256 colors, so GIF is NOT a good choice for photographic images. Most any browser and most applications can deal with GIF file formats. Photographs that are saved to the GIF format (from another format such as JPEG, BMP or TIFF) will look considerably degraded as compared to the non-GIF file. You can identify a GIF file by looking at the filename: the last three letters will be "gif" (example filename: rocket.gif).
Another file format that is quite popular is the PSD format (example filename: rocket.psd). The psd format is used in Adobe PhotoShop. This format is not a good choice for web images, as the files are large. PSD files do not use compression, they maintain full image fidelity and a host of specialized options. And, if your image has what are known as multiple layers, the file size can become huge!
Image File Basics
Like any other computer file, an image file is simply a group of digital words (or "bytes"). Each image file type has its own specific internal format or structure. The image file is identified by the computer as having a specific type by use of the file extension (the three or four characters at the end of the filename after the dot). When the computer is instructed to open a particular file, it looks at the filename extension to identify the file type, and then proceeds to open the file based on the "rules" for that file type. For example, a JPEG (often abbreviated "JPG") image has a specific internal structure that is common to all JPEG files. Part of this internal structure is known as the file "header". The header basically contains information about the image file: its size, dimensions, etc. All image files have a header within the file that contains the information necessary for the computer to decode and display the image. The header can (and often does) contain a lot of other information such as the date of the image, user comments, etc. When an operator "double clicks" on a JPEG file (from Windows Explorer), the computer (among other things) says, "the extension of this filename is JPG so it must be a JPEG image file… I'll open and display this file in accordance with the rules that apply to the JPEG file format". Do note that just because a file has a particular extension does not mean that it "is" that type of file. Using Explorer or DOS, you can easily rename a text file and give it the extension "JPG". However, when the computer tries to open that file as a JPG, it quickly recognizes that the internal structure of the file does not make sense for a JPG file, and an error message will be displayed.
Image Basics: Image Size on the Computer Screen
Computer images are made up from individual pixels; a typical image contains thousands of such pixels. The dimensions of an image are given in pixels by pixels (W x H). If an image has dimensions of 100 pixels by 200 pixels, the image contains 100 x 200 or 20,000 pixels. Note that an image of 100 x 200 pixels is relatively small as seen on a typical computer monitor (most people today run monitors at the 1024 x 768 setting or higher). HDTV (1080) resolution uses an image size of 1920 x 1080 (this comes out to be 2,073,600 pixels). People have larger and larger cmputer monitors these days and many are full HDTV capable. However, a typical 19" monitor has a resolution of 1024x768 pixels (similar to 720 HDTV resolution). For most web pages, people want the images to fit on the screen (they typically don't want to have to scroll around to see all of an image). Therefore, it is deisreable to keep the images for your page at a size that will look good on a typical computer monitor. An image of 640x480 would be a decent size, for larger images you could go as large as 900x 600 (this would fill most of the screen of a typical 19" monitor). Most of today's digital cameras have image formats quite a bit larger than this, so if youy are posting photos from your camera on your web site you will likely have to resize them in a program such as PhotoShop.
Image Basics: Image File Size
Previously it was mentioned that images of 640 x 480 pixels were a good size for many applications and will be "medium" size as seen on the typical 19" monitor. But how big (in terms of bytes) is this image? For 24 bit color images (this applies to many image types), the file size (in memory, more on this in a minute) is equal to 3 times the number of pixels. This is because each pixel requires 3 bytes: one for the red component of the image, on for the blue component of the image, and one for the green component of the image. Thus, for a 640 x 480 image, the file size (in memory) is 640 x 480 x 3 = 921600 bytes! This is approaching one megabyte, a non-trivial file size (when dealing with dial up modems that is). This is the (minimum) amount of space required in memory by the image when the image is open. Note that when the image is stored on disk, it can be considerably smaller (depending on the file type used). For non-compression image file types, the file (as stored on disk) will be of a larger size than three times the number of pixels. This is because the image file contains not only the pixel information; it also contains additional information (the header as described earlier, among other things). When images are in JPEG format, the file size is smaller than the numbers shown above (due to the way the JPEG compression process works).
Image Basics: 24 Bit Color Images
While we are on the topic of image basics, a word on what a "24 bit color" image is. There are many other image formats, howeveer 24 bit color images are among the most commonly used. As mentioned in the previous section, each pixel of a (24 bit) color image consists of 3 bytes (one byte for each of the three primary colors, red, green and blue). Each byte is 8 bits, allowing 256 unique values to be represented. Within each "color" byte, this means that 256 "shades" of the color are possible. Since there are 3 colors, the total number of possible colors is 256 x 256 x 256 = 16777216. Thus, when you hear someone talk about a (computer or monitor) system with "16 million colors", this is what they are referring to. Be aware however that the human eye cannot distinguish this many colors (some of the shades are extremely close to others).
JPEG: The most common image file format for photos on the Web
The vast majority of photographs (remember: photographs, not graphics!) on the web are of JPEG format, and for good reason. JPEG is a format that allows one to post a file that would otherwise be too large for downloading in a reasonable time. As mentioned before however, the relatively small file sizes that result when JPEG format is used come at a price: image quality. When an image is to be saved as a JPEG file, the user is often prompted to set the "compression factor" or "quality factor" (not all image processing programs allow the user to set this value, or sometimes "generic" settings such as low, medium and high are available). The possible values for compression factor vary from program to program, however the range typically runs from from 1 (where 1 usually means high compression (low quality)) to 12 (usually meaning low compression (high quality)). Keep in mind that some image processing programs use reverse numbering for compression (small numbers being high compression and low numbers being low compression). For this article, we assume a compression factor of 1 means high compression (low quality). When saving an image to the JPEG format, the user must decide on a compression (quality) factor; this basically comes down to deciding on a compromise between file size vs. image quality. For many photographs, values in the range of 8 can be tolerated with only a slight degradation of the image. If your application allows you to set the compression factor , experiment with an image to see the results, but MAKE SURE YOU START WITH A TIFF or BMP IMAGE each time!!! Starting with a TIFF or BMP image, save the image as JPEG (rename it "test1.jpg for example) with a compression factor if 1, and then go back to the original TIFF or BMP image and save it again (under a new name) with a compression factor of 10. Then open the two newly created JPEG images and compare them (side by side) to the original TIFF or BMP image. The image with compression factor of 10 will be virtually indistinguishable from the original, while the image with compression factor of 1 will be significantly degraded.
JPEG: More advanced topics:
JPEG is an image file format that compresses the file size when the image is stored on a disk (or other media). JPEG works by "discarding" some of the image information (when the image is saved to that format) so that the file size can be much smaller; JPEG does this by discarding information that will cause the least amount of degradation as seen by the human eye. Depending on the compression factor selected by the operator, the amount of information discarded can be large or small. When an image is to be saved in JPEG format, the JPEG algorithm processes the image information (image information consists of the 3 bytes that make up each pixel of the image). The JPEG algorithm performs a mathematical process known as a "Discrete Cosine Transform" (DCT) on the image. In high level terms it does the following: the image is first broken up into small blocks (8 x8 pixels), and the DCT of each block is then calculated. The DCT of each block generates a series of numbers, numbers that represent the detail within that portion of the image. Depending upon the compression factor chosen by the user, some portion of the DCT result is then discarded (the DCT result is truncated). This is where the reduction in file size comes from. The information remaining after the DCT truncation is then further processed and arranged into the proper format (in accordance with the internal file structure of the JPEG file format) for subsequent saving to disk or other media. When the file is to be re-opened from disk, a similar (but reverse) process is performed, however because some of the information was previously discarded the resulting image will NOT be as good as the original! This is the "bad" part about JPEG; once you discard the information, it is gone for good and cannot be restored. Do note that once the image is opened again, the amount of memory required by the image is the same as it was before the save to JPEG (despite the image having less detail)! This will always be true if the dimensions of the image (pixels by pixels) remain the same. So, by saving in image in JPEG format, the image file size on disk is reduced, however the amount of memory required by the image (when it is open) is the same as it was originally even though the image now has less detail!
JPEG: Common mistakes to avoid
JPEG is a very handy file format, and allows one to save images that are nearly as good as the original but with perhaps 1/10 the original file size. To get the most out of the format, there are some otherwise subtle things to be aware of:
Digital Camera Basics:
This section is provided to help those to understand how digital cameras work at a high level. This will hopefully allow the user to take better digital pictures.
A digital camera has a lens that is similar to any conventional film camera, however instead of film a digital camera uses a device known as a Charge Coupled Device (CCD) to capture the image. A CCD can be thought of as an array of "buckets" which collect light when the shutter of the camera is opened. The size of the CCD chip determines the quality of the image; most reasonably decent consumer digital cameras today have CCD chips with a total of at least five million pixels. Some of the newer cameras available today (2009) have approximately 15 million pixels, these will allow superb quality enlargements (prints) of up to about 20x30 inches. CCDs have a number of advantages as compared to film, however CCDs are not without their own quirks and problems. For example, the "depth" of the "buckets" on a CCD is limited; this places a limit on the range of brightness (or dynamic range) that can be recorded. Another problem is noise. All CCD chips are subject to "thermal" noise; this basically means that the buckets are filling up with some amount of signal even when the camera shutter is closed (the CCD is in total darkness). This noise has the same visual effect as "grain" in conventional film. Thermal noise in CCDs is greatly reduced by cooling the CCD chip (this is mandatory for telescope CCD cameras, where light levels are very low). Additional noise is added to the image when the information in the pixels (buckets) of the CDD is processed through the Analog to Digital (A/D) converter. CCDs can be overexposed also; this will not damage the CDD, however the image quality will often suffer (sometimes called "burn out"). Despite all of these problems, CCD cameras available today produce remarkable images at a small fraction of the cost of conventional film camera prints (especially so if you only view them on the computer screen).
Most digital cameras output images in JPEG format (most have the option to also output TIFF images, however most people do not use TIFF format too often because far fewer images can fit into the memory card of the digital camera). Note that images from the CCD do not start out as JPEG; images from the CDD camera are raw "full fidelity" images. If your camera is set to JPEG mode (almost always the default setting), the picture you take moast likely starts out as a 24 bit (possibly more bits depending on your camera) color image and is then processed and saved as a JPEG image. Most cameras allow the user to select the image size (in terms of pixels) and also the quality factor (for JPEG compression). Do note that most cameras allow the user to select "medium" or "high" quality JPEG images, however I have never seen one that allows the user to specify a numerical compression factor.
For photographs that are to be displayed on the web, a final size of 640 x 480 is adequate for many applications. Most cameras today do not have a setting that saves images in such a small format, so you will likely have to resize images from your camera before posting them to a web page. If you are taking a shot of a distant object and plan to later crop the image to make the object appear larger, USE HIGHER RESOLUTION (the highest your camera allows). Also use the "high" quality JPEG compression option if your camera has various settings (for the best quality, use the TIFF setting, but do not plan on taking too many images before a card download is required).
For fast moving objects, use a higher shutter speed (if your camera allows it). This will help to freeze the motion, but it will also result in a "noisier" image (this is because less light has a chance to be collected on the CCD in relation to the thermal noise that is ever present). Some cameras also have an "ISO" setting. Higher ISO settings are useful for fast moving objects, however the noise problem (just mentioned) may be more noticeable with higher ISO settings (and/or fast shutter speeds). In theory, a digital camera should perform better (in terms of image noise) on cold winter days. This is because the thermal noise generated within the CCD will be considerably less due to the lower temperature.
This section is intended to provide a brief overview of how scanners work and how to best use them to obtain digital images from printed media.
Like a digital camera, a scanner uses a CCD to obtain the image. Unlike the CCD in a digital camera, the CCD in a scanner is moved along the length of the item to be scanned by an arm controlled by a stepper motor. The CCD "reads" information from the device as it is moved along the item (the arm is actually moving in small, discrete steps, despite a seemingly continuous motion). The scanner contains circuitry that subsequently assembles the raw data into an image file representative of the item placed on the scanner bed. The image is (often) then exported to an application such as PhotoShop for additional editing or processing.
Common Scanner Mistakes:
Probably the most often understood parameter associated with scanning an item is the resolution at which the item should be scanned. Scanning should always be done at a resolution suitable for the final application of the scanned item. All scanner software packages allow the user to select the resolution to be used for a particular scan, resolution is in units of "dpi" (dots per inch). Most scanners support a range from 75 dpi to as much as 6000 dpi (more on these extremely high numbers later).
For posting images (scanned from typical 4" x 6" 35mm camera prints) on the web, a scan resolution of 100 dpi is usually more than adequate. If you plan to crop a small area from a 4x6 print, 300 to 600 dpi can be used. Avoid the temptation to scan at resolutions of thousands of pixels (for example 4000 dpi)! This will only create a gigantic image (in terms of pixels and file size)! Scanning 35mm film prints at anything above about 600 dpi will generally have little benefit because most 35mm prints do not have detail above this range. If you scan a 35mm print at 2000 dpi, you will be getting scans of the film grain and paper imperfections. And, if you plan to post a 2000 dpi image on the web (from a 4x6 print), you'll only have to reduce the image (with image processing software) later on. Otherwise, your image will be about 12000 x 8000 pixels in size, far larger than will fit on any (common) computer screen!
If you plan to print scanned images, you will want a higher dpi setting. 300 dpi (or higher) will fit the purpose in this case.
When you scan an image, it will generally come out of the scanner (and into your application) as a "raw" image (depending on your default settings). It is then up to the user to decide on what format to save the image in. TIFF is a good choice for critical work, however high quaqlity JPEG may be fine for most applications. It's probably best
NOTE: If you are scanning old family photos (or any other important photo) for archival purposes, scan them at a fairly high resolution (400 to 600 dpi) and make sure you save the file in TIFF format! If you save them as JPEG (in order to get more images on a disk) you are defeating the purpose of a high resolution scan! Scanning at a high resolution (for archival purposes) and then saving the files as JPEG will cause you to lose some of the detailed information you are trying to preserve!
12000 dpi?!? Many consumer scanners today tout resolutions of 12000 dpi (or even higher). However, scanner manufacturers like to play games with how things are rated (to make them "sound" like a better buy). In most inexpensive consumer grade scanners, the maximum "real" resolution you can get is typically around 4000 dpi. This is referred to as optical resolution (basically it is the limit of the CCD used in the scanner). So how can such a scanner get to 12000 dpi? By a process known as interpolation. Interpolation is a mathematical process which "estimates" what a missing value should be based on the behavior of the "real" data points in the neighborhood of the information we wish to have. As such, the data is not "real", it is kind of a "guess". Most consumers do not have a need to perform scans at resolutions above around 2000 dpi (one exception is 35mm slides and 35mm film negatives, 4000 dpi is not out of the question for a quality silde or negative).
Film/Slide scanners: Another type of scanner known as a Slide or Film scanner is the best device to use if you plan to scan film (negatives) or slides. Many flatbed scanners have an accessory that allows you to scan film or slides. These may do a satisfactory job however if you want the best performance a dedicated slide scanner is preferred. Optical resolution of 4000 dpi is not uncommon for a good consumer film scanner. The reason that such high resolution is needed is because 35mm film (or slides) are not that big to start with. You need to get some serious dpi on these small items in order to get a decent image. Film and slides have greater brightness range than prints, so additional bit depth can be a benefit. For example a slide scanner (or even a flatbed scanner) might be advertised as 42 bits (what this really means is that the A/D converter used within the unit is a 14 bit device). This benefit will only be realized if your image processing application can work with such images (many cannot). If you plan to archive a large collection of slides (or old film negatives) and you want the best archive quality, use a dedicated slide scanner instead of a flatbed scanner.
Concerned about people using your (copyrighted) photos? You may want to post an image on the web, but at the same time you do not want your original work to be available for "free" should someone decide to "claim" it as their property and use it for profit. This is another reason why it makes sense to post images scanned at lower resolution (100 dpi). Images at this resolution look fine on the computer screen, but try and print one (of any appreciable size) and see how it looks! (Not good). On the other hand, if you post a giant image (one that was scanned at 600 dpi), anyone could download and make a rather nice print of this image (and sell it if there were a market for it). So, the first line of defense against this form of "piracy" is to post images that look fine on the screen but look terrible when printed. 100 dpi scan resolution fits this bill!
SUMMARY: For posting images on the web: