While time-lapse video recorders (TLR) using videocassettes remain in use in many smaller video surveillance systems, digital video recorders (DVR) and network video recorders (NVR) continue to be the preferred choice for larger and more complex systems. The video cameras that provide the images to these recording systems may be either analog or IP (internet protocol). For TLRs, analog cameras are almost invariably required, though it is technically possible to use IP cameras in a TLR system. For DVRs and NVRs, either analog or IP cameras, or a mixture of the two types, may be used. For the purpose of video forensics, knowing the type of camera that originally captured the video is critical to an understanding of several important aspects of the video material to be examined.
In North America, analog video cameras are almost certain to be compliant with the NTSC video system. (In other parts of the world, cameras may comply with PAL, SECAM, or other video system standards, which differ from NTSC in many crucial aspects. For the purpose of this discussion, we shall limit ourselves to the NTSC system.) The NTSC (National Television System Committee) standards for video systems were developed primarily to ensure the compatibility of broadcast television signals with consumer television sets. The first standard was published in 1941, with subsequent revisions to accommodate advances such as color TV, and all of the standards are readily available from many sources for reference purposes. The NTSC system standard is perhaps most important because it describes the way in which a video image is created on the “old-fashioned” CRT (cathode ray tube) television sets we used for well over 50 years. It should come as no surprise that analog video surveillance cameras of that period were designed and manufactured to provide a video picture that would display in an identical manner on video monitors using CRTs. Therefore, we can safely assume that an analog NTSC camera produces a signal that complies with the relevant sections of the NTSC standards.
Why is it important for a video forensics analyst to know if video material originated from an NTSC camera? Regardless of the method used to transmit and record the video images, the use of an analog NTSC camera places certain limitations and restrictions on the original video source and, consequently, on the recorded video images. We are frequently presented with digital video files that are known to have originated from an NTSC camera and, in many cases, can point to attributes of the video images that are inconsistent with an NTSC source. In some cases, there are anomalies that can be readily explained in no other way. In the following paragraphs, we will discuss a few of the most relevant features of the NTSC video system and the analog video cameras that employ it.
First, the aspect ratio of an NTSC video image is 1.33, or 4 units (wide) by 3 units (high). This aspect ratio is specified by the NTSC standards, but may vary slightly from system to system through minor variations in CRT scanning or other equipment variations. However, a DVR that produces a video file that is 720 pixels (wide) by 480 pixels (high) from an analog NTSC camera is either substantially distorting the image or cutting off portions of the image when recording since the aspect ratio of the digital video is 1.5 and definitely not 1.33. This is a common problem and once that we see in many cases.
Second, the standard frame rate for NTSC video is 29.97 frames per second. A new frame (complete image) is presented from the camera to the recorder every 33.4 milliseconds on a continuous basis. The consequences of this fixed, predictable frame rate can make a dramatic difference if the purpose of the analysis is to ascertain the exact time interval between any two frames in the digital video material. Since accurate and reliable time intervals are critical to establishing such basic data as the velocity of vehicles or other moving objects shown in the video, we are often asked to render an opinion on this specific aspect of the material. We will discuss this topic in more detail in a subsequent post. Ironically, “old-fashioned” videocassette recorders are often much better at providing accurate and reliable time interval measurements, as they were originally designed to record and play back video at precisely the same rate at which it was recorded (29.97 frames per second).
Third, NTSC video images are interlaced and each frame actually consists of two separate fields. A CRT monitor creates a visible image by scanning an electron beam horizontally across the inside face of the tube. The electron beam, guided by a strong magnetic field, starts at one side of the tube and scans to the other, then returns to the starting side and scans another line below the first. This continues until the entire face of the tube has been scanned from top to bottom, creating a visible image. During the development of consumer television, it was discovered that creating an entire image every 33 milliseconds was not fast enough to prevent a noticeable and objectionable lag when objects in the image are moving. To compensate, the NTSC standards require that the electron beam scans the odd-numbered lines of an image and then returns to scan the even-numbered lines, thus requiring two complete scans of the screen to create what is a single interlaced frame. (Scanning just the odd or even-numbered lines is called a “field.” It takes two fields to create a frame. A single field takes approximately 16.7 milliseconds to create.) When a DVR or NVR records an analog camera, it must employ some technical method to convert the interlaced video signal to a digital video format, most of which are not interlaced. (A video image that is not interlaced is called “progressive.”) Some digital systems simply ignore one of the fields, recording just the odd or even-numbered field as if it was a complete frame. Other systems may combine both fields into a single progressive image. Each method creates slight anomalies that may have an impact on video analysis.
Fourth, NTSC video images are composed of discrete horizontal lines, but the horizontal lines themselves are continuously variable from side to side. A complete video image requires 525 horizontal lines to create (262.5 per field). Of these, only 483 lines are actually visible. The remainder are used for timing and control purposes and do not normally appear on the visible portion of the CRT screen. (Early closed captioning for broadcast television embedded the caption information in the non-visible lines.) Therefore, the maximum number of discrete picture elements in the vertical portion of an NTSC video image is limited to 483. Any other number of vertical elements is a result of interpolation by the recording device, or by omitting one of the fields (see paragraph above). The horizontal scan lines themselves do not have discrete elements. The intensity of the electron beam that scans the inside of the CRT varies continuously over a fixed range as it moves from one side to the other. (Other techniques are employed to render color.) Since the signal varies continuously, there is no standard number of picture elements specified by NTSC for the horizontal dimension. The ability of a specific camera or monitor to resolve in the horizontal dimension is normally measured by the number of vertical lines it can successfully display on the screen. Both video cameras and CRT monitors vary tremendously in the number of vertical lines they can produce or display. It is not at all unusual for a system to have cameras which are only capable of producing a video image of fewer than 360 vertical lines connected to high-quality CRT monitors that can display more than 525 vertical lines. Again, understanding and interpreting the implications of the way in which NTSC video images are created plays an important part when reviewing digital video material.
So far, we have discussed analog NTSC cameras exclusively. We will now turn our attention to IP video cameras.
Many consumers confuse digital video cameras with IP video cameras. Some analog NTSC video cameras use digital technology to capture and process video images, and these cameras can certainly be considered to be digital. However, the video is then converted to NTSC standards to be transmitted on coaxial cable, twisted pairs, or some other transmission media. The conversion to an NTSC signal necessarily means that the video is then subject to the NTSC requirements discussed in previous paragraphs. IP video cameras do not comply with NTSC standards, though some units may simultaneously provide both an IP and an NTSC output.
IP video cameras transmit video images to the recording device using the internet protocol. At the most basic level, this requires the image to be digitized (or “encoded”) and then converted into data packets that can be transmitted over a data network. The variety of methods for digitizing and transmitting video from a camera are far too numerous to describe in this paper, so we will limit ourselves to describing some of the key differences between IP cameras and NTSC cameras.
Unlike cameras that comply with NTSC standards, IP cameras are not required to provide video at a standard, uniform frame rate. (When dealing with digital video, many analysts prefer to use the terms “image rate” or “images per second,” rather than “frame rate” or “frames per second.” For the purposes of this paper and to make comparison easier, we will use the “frame” terminology for both types of camera.) There are two major reasons for this: First, most IP cameras can be programmed to provide individual frames either at specified intervals or upon request. This prevents overloading the data network by transmitting video data that are not needed or cannot be recorded by the system. Second, digitized video is often encoded using methods that permit variable frame rates. For example, many MPEG-4 encoding methods embed information on the presentation time of an individual image and the length of time that it should be shown on the display monitor. This is in sharp contrast to the NTSC system, where video frames are presented continuously and at fixed intervals. As a consequence, it can be extremely difficult to ascertain the actual time interval between two events (or frames) in a digitized video sequence unless we have extremely high confidence in both the camera and the recording system.
Another major difference between NTSC systems and IP systems is that the aspect ratio of the images may vary significantly depending on the equipment used and the recording settings. It is not unusual for an IP camera to transmit video images with one aspect ratio (for example, 1.5, or 720 pixels by 480 pixels) that is subsequently altered either in recording or when it is played back on a monitor. This is further complicated for both NTSC and IP cameras by the fact that individual pixels on NTSC monitors are of a slightly different shape than those found on most computer monitors. Ascertaining exactly what aspect ratio the original image had can be very challenging, but critical for measuring the velocity or position of moving objects.
Finally, the digitizing process that encodes the digital video at the camera can introduce some significant anomalies. Because of technical limitations and the desire to reduce bandwidth usage on the data network, many decisions have to be made about the acceptable frame rate, image size, and image quality for any individual IP camera. (This is also a major consideration when the video signal from an NTSC camera has been digitized for recording or transmission.) The encoding process that digitizes and compresses the video images necessarily introduces artifacts and anomalies into the images. Perhaps the best known and most easily recognized artifact is macroblocking, the appearance of block-like structures in some portions of the video image. But there are a number of other characteristics of the encoding process that can produce more subtle alterations in the image that are easily missed by the typical viewer.
This is not to say that IP video cameras are inferior to NTSC video cameras. One area in which IP video cameras excel is image resolution. It would not be possible, for example, to transmit video images with megapixel resolution using NTSC technology. As we have seen, there are hard limits on the number of horizontal lines in an NTSC signal and even economical IP video cameras far exceed these limits by producing images with two and three times this vertical resolution limit. There are excellent reasons for users to select video surveillance systems that use modern IP cameras.
We have attempted in this paper to identify some of the important characteristics that distinguish NTSC video cameras from IP video cameras and to describe the importance of identifying which type of camera was used to create a digital video file that is subject to analysis. In subsequent papers, we will discuss some of these topics in more detail and introduce new topics of interest.