Digital Video Forensics: “Is this video clip reliable?”

Posted by Kile Unterzuber on January 15, 2015

When we receive a request from an attorney or a forensic engineer to review digital video material, we are most often asked, “Is this video clip reliable?” Over the years, we’ve learned that this can mean many different things. The material in question is often a short piece of video in the form of a digital file that can be played using common media players, such as Windows Media Player. In some cases, the material is accompanied by proprietary viewer software that is required to view the video. On occasion, the video is actually in a DVD format, complete with title and menu.

But what our clients really want to know is one or more of the following:

“Is this video clip a true and complete copy of the original?”

“Has this video clip been altered or edited?”

“Can I rely on the time and date that appear in the video? To what degree of precision?”

“Are the proportions of the picture correct? Can I use it to measure distances?”

We generally deal with civil cases and not criminal investigations. In a criminal investigation, it is usually enough for investigators to obtain identification from the video. This may be facial identification of a perpetrator, the approximate height and weight of the individual, or simply a general description of the clothing the perpetrator was wearing. In some cases, investigators seek to identify a vehicle, by model, make, or color. The investigating agency may extract important images for enhancement or to distribute in aid to the investigation. But it is very rare that a criminal investigator concerns himself or herself with the precision of the date and time stamp, or whether a single frame may be missing from the video sequence.

In civil lawsuits, it is a different matter. Unsurprisingly (and according to the “CSI” shows), video recording systems are everywhere and frequently record video of incidents unintentionally. We have, for example, worked with numerous video files from security systems that recorded vehicle accidents in the background, a purpose for which they were not originally designed or installed. A civil lawsuit in connection with the accident might require the involvement of forensic engineers, who will normally perform a survey of the accident site to obtain accurate measurements of the positions of the vehicles before, during, and after the accident. Since vehicle speed may be a contributing factor in an accident, the engineers want to estimate the vehicle speed(s) at different locations. Speed is often estimated on the basis of skid marks or the amount of damage sustained by the vehicle(s), but the availability of recorded video gives the forensic engineer the opportunity to estimate vehicle speed by time interval. Knowing that speed=distance/elapsed time, and having accurately measured vehicle position during the investigation, all we need is a precise measure of the time interval between the frames of the video clip that show the vehicle at those measured positions. What could be simpler?

As it turns out, a lot. Most digital video recording systems were never intended to be used to measure time intervals to the precision required to differentiate between a vehicle going 45 MPH in a 45 MPH zone and a vehicle going 56 MPH in a 45 MPH zone. In fact, most date and time stamps inserted in video recording systems do not display with any more accuracy than one-second intervals, though some may display to a greater precision. If we do have a system with sufficient precision, there may then be the question of accuracy. Being precise to the millisecond is one thing. Being accurate is another. The task is further complicated if the video clip has been exported by the video recording system in a format different from that which was used in the original recording. It is very common for a video recording system that uses variable frame rate (i.e., the intervals between successive frames are not uniform) to export video clips to a video file format that uses a constant frame rate. The exported files are quite useful for identification purposes, but may useless for performing the calculations required to accurately estimate vehicle speed.

We frequently receive video clips in which the date and time stamp advances 15 minutes, or some other fixed time period, but the video clip actually plays in much less time using a software media player. We have several clips in which the audio portion of the file is shorter than the video itself, sometimes by more than 10%. The questions are then, “Which time intervals are correct? Can we state with confidence that the actual time interval between the vehicle at this location and that location is X.X seconds? What is our confidence interval for our estimate?”

Strangely, old-fashioned videocassette recorders are often more reliable and useful for the purpose of estimating time intervals than are modern digital video recording systems. A standard VHS recorder was designed to record video at 29.97 frames per second, and we have the further advantage of knowing that the camera was providing video to the recorder at an identical rate. Newer IP video cameras and digital recording systems normally work with variable frame rates and may even add the time stamp to the images only after they have been received at the recording unit, adding the problem of network latency to the mix.

A gentleman who was very experienced with digital video and had worked for years in the industry once told me that he would, “…never try to estimate time intervals in digital video with BOTH precision and accuracy.” While this might be an extreme view, it certainly reflects the challenges that face us.

We have attempted in this article to identify some of the important considerations in establishing the “reliability” of digital video used in forensic accident investigation. In subsequent articles, we will discuss some of these topics in more detail and introduce new topics of interest.

Tags: Expert Witness, Video Forensics

Posted in: Expert Witness, Security Technology, Video Forensics

Digital Video Forensics: Analog and IP Video Cameras

Posted by Kile Unterzuber on January 6, 2015

While time-lapse video recorders (TLR) using videocassettes remain in use in many smaller video surveillance systems, digital video recorders (DVR) and network video recorders (NVR) continue to be the preferred choice for larger and more complex systems. The video cameras that provide the images to these recording systems may be either analog or IP (internet protocol). For TLRs, analog cameras are almost invariably required, though it is technically possible to use IP cameras in a TLR system. For DVRs and NVRs, either analog or IP cameras, or a mixture of the two types, may be used. For the purpose of video forensics, knowing the type of camera that originally captured the video is critical to an understanding of several important aspects of the video material to be examined.

In North America, analog video cameras are almost certain to be compliant with the NTSC video system. (In other parts of the world, cameras may comply with PAL, SECAM, or other video system standards, which differ from NTSC in many crucial aspects. For the purpose of this discussion, we shall limit ourselves to the NTSC system.) The NTSC (National Television System Committee) standards for video systems were developed primarily to ensure the compatibility of broadcast television signals with consumer television sets. The first standard was published in 1941, with subsequent revisions to accommodate advances such as color TV, and all of the standards are readily available from many sources for reference purposes. The NTSC system standard is perhaps most important because it describes the way in which a video image is created on the “old-fashioned” CRT (cathode ray tube) television sets we used for well over 50 years. It should come as no surprise that analog video surveillance cameras of that period were designed and manufactured to provide a video picture that would display in an identical manner on video monitors using CRTs. Therefore, we can safely assume that an analog NTSC camera produces a signal that complies with the relevant sections of the NTSC standards.

Why is it important for a video forensics analyst to know if video material originated from an NTSC camera? Regardless of the method used to transmit and record the video images, the use of an analog NTSC camera places certain limitations and restrictions on the original video source and, consequently, on the recorded video images. We are frequently presented with digital video files that are known to have originated from an NTSC camera and, in many cases, can point to attributes of the video images that are inconsistent with an NTSC source. In some cases, there are anomalies that can be readily explained in no other way. In the following paragraphs, we will discuss a few of the most relevant features of the NTSC video system and the analog video cameras that employ it.

First, the aspect ratio of an NTSC video image is 1.33, or 4 units (wide) by 3 units (high). This aspect ratio is specified by the NTSC standards, but may vary slightly from system to system through minor variations in CRT scanning or other equipment variations. However, a DVR that produces a video file that is 720 pixels (wide) by 480 pixels (high) from an analog NTSC camera is either substantially distorting the image or cutting off portions of the image when recording since the aspect ratio of the digital video is 1.5 and definitely not 1.33. This is a common problem and once that we see in many cases.

Second, the standard frame rate for NTSC video is 29.97 frames per second. A new frame (complete image) is presented from the camera to the recorder every 33.4 milliseconds on a continuous basis. The consequences of this fixed, predictable frame rate can make a dramatic difference if the purpose of the analysis is to ascertain the exact time interval between any two frames in the digital video material. Since accurate and reliable time intervals are critical to establishing such basic data as the velocity of vehicles or other moving objects shown in the video, we are often asked to render an opinion on this specific aspect of the material. We will discuss this topic in more detail in a subsequent post. Ironically, “old-fashioned” videocassette recorders are often much better at providing accurate and reliable time interval measurements, as they were originally designed to record and play back video at precisely the same rate at which it was recorded (29.97 frames per second).

Third, NTSC video images are interlaced and each frame actually consists of two separate fields. A CRT monitor creates a visible image by scanning an electron beam horizontally across the inside face of the tube. The electron beam, guided by a strong magnetic field, starts at one side of the tube and scans to the other, then returns to the starting side and scans another line below the first. This continues until the entire face of the tube has been scanned from top to bottom, creating a visible image. During the development of consumer television, it was discovered that creating an entire image every 33 milliseconds was not fast enough to prevent a noticeable and objectionable lag when objects in the image are moving. To compensate, the NTSC standards require that the electron beam scans the odd-numbered lines of an image and then returns to scan the even-numbered lines, thus requiring two complete scans of the screen to create what is a single interlaced frame. (Scanning just the odd or even-numbered lines is called a “field.” It takes two fields to create a frame. A single field takes approximately 16.7 milliseconds to create.) When a DVR or NVR records an analog camera, it must employ some technical method to convert the interlaced video signal to a digital video format, most of which are not interlaced. (A video image that is not interlaced is called “progressive.”) Some digital systems simply ignore one of the fields, recording just the odd or even-numbered field as if it was a complete frame. Other systems may combine both fields into a single progressive image. Each method creates slight anomalies that may have an impact on video analysis.

Fourth, NTSC video images are composed of discrete horizontal lines, but the horizontal lines themselves are continuously variable from side to side. A complete video image requires 525 horizontal lines to create (262.5 per field). Of these, only 483 lines are actually visible. The remainder are used for timing and control purposes and do not normally appear on the visible portion of the CRT screen. (Early closed captioning for broadcast television embedded the caption information in the non-visible lines.) Therefore, the maximum number of discrete picture elements in the vertical portion of an NTSC video image is limited to 483. Any other number of vertical elements is a result of interpolation by the recording device, or by omitting one of the fields (see paragraph above). The horizontal scan lines themselves do not have discrete elements. The intensity of the electron beam that scans the inside of the CRT varies continuously over a fixed range as it moves from one side to the other. (Other techniques are employed to render color.) Since the signal varies continuously, there is no standard number of picture elements specified by NTSC for the horizontal dimension. The ability of a specific camera or monitor to resolve in the horizontal dimension is normally measured by the number of vertical lines it can successfully display on the screen. Both video cameras and CRT monitors vary tremendously in the number of vertical lines they can produce or display. It is not at all unusual for a system to have cameras which are only capable of producing a video image of fewer than 360 vertical lines connected to high-quality CRT monitors that can display more than 525 vertical lines. Again, understanding and interpreting the implications of the way in which NTSC video images are created plays an important part when reviewing digital video material.

So far, we have discussed analog NTSC cameras exclusively. We will now turn our attention to IP video cameras.

Many consumers confuse digital video cameras with IP video cameras. Some analog NTSC video cameras use digital technology to capture and process video images, and these cameras can certainly be considered to be digital. However, the video is then converted to NTSC standards to be transmitted on coaxial cable, twisted pairs, or some other transmission media. The conversion to an NTSC signal necessarily means that the video is then subject to the NTSC requirements discussed in previous paragraphs. IP video cameras do not comply with NTSC standards, though some units may simultaneously provide both an IP and an NTSC output.

IP video cameras transmit video images to the recording device using the internet protocol. At the most basic level, this requires the image to be digitized (or “encoded”) and then converted into data packets that can be transmitted over a data network. The variety of methods for digitizing and transmitting video from a camera are far too numerous to describe in this paper, so we will limit ourselves to describing some of the key differences between IP cameras and NTSC cameras.

Unlike cameras that comply with NTSC standards, IP cameras are not required to provide video at a standard, uniform frame rate. (When dealing with digital video, many analysts prefer to use the terms “image rate” or “images per second,” rather than “frame rate” or “frames per second.” For the purposes of this paper and to make comparison easier, we will use the “frame” terminology for both types of camera.) There are two major reasons for this: First, most IP cameras can be programmed to provide individual frames either at specified intervals or upon request. This prevents overloading the data network by transmitting video data that are not needed or cannot be recorded by the system. Second, digitized video is often encoded using methods that permit variable frame rates. For example, many MPEG-4 encoding methods embed information on the presentation time of an individual image and the length of time that it should be shown on the display monitor. This is in sharp contrast to the NTSC system, where video frames are presented continuously and at fixed intervals. As a consequence, it can be extremely difficult to ascertain the actual time interval between two events (or frames) in a digitized video sequence unless we have extremely high confidence in both the camera and the recording system.

Another major difference between NTSC systems and IP systems is that the aspect ratio of the images may vary significantly depending on the equipment used and the recording settings. It is not unusual for an IP camera to transmit video images with one aspect ratio (for example, 1.5, or 720 pixels by 480 pixels) that is subsequently altered either in recording or when it is played back on a monitor. This is further complicated for both NTSC and IP cameras by the fact that individual pixels on NTSC monitors are of a slightly different shape than those found on most computer monitors. Ascertaining exactly what aspect ratio the original image had can be very challenging, but critical for measuring the velocity or position of moving objects.

Finally, the digitizing process that encodes the digital video at the camera can introduce some significant anomalies. Because of technical limitations and the desire to reduce bandwidth usage on the data network, many decisions have to be made about the acceptable frame rate, image size, and image quality for any individual IP camera. (This is also a major consideration when the video signal from an NTSC camera has been digitized for recording or transmission.) The encoding process that digitizes and compresses the video images necessarily introduces artifacts and anomalies into the images. Perhaps the best known and most easily recognized artifact is macroblocking, the appearance of block-like structures in some portions of the video image. But there are a number of other characteristics of the encoding process that can produce more subtle alterations in the image that are easily missed by the typical viewer.

This is not to say that IP video cameras are inferior to NTSC video cameras. One area in which IP video cameras excel is image resolution. It would not be possible, for example, to transmit video images with megapixel resolution using NTSC technology. As we have seen, there are hard limits on the number of horizontal lines in an NTSC signal and even economical IP video cameras far exceed these limits by producing images with two and three times this vertical resolution limit. There are excellent reasons for users to select video surveillance systems that use modern IP cameras.

We have attempted in this paper to identify some of the important characteristics that distinguish NTSC video cameras from IP video cameras and to describe the importance of identifying which type of camera was used to create a digital video file that is subject to analysis. In subsequent papers, we will discuss some of these topics in more detail and introduce new topics of interest.

Tags: Security Technology, Video Forensics

Posted in: Video Forensics