This is the fourth post in a multi-part series on how to make successful measurements, i.e., how to get “good data”. In the first post we made the case that good data is about time, money, performance, results, and reputation. In the second and third posts we discussed how to make good measurements of fluctuating and static phenomena, respectively. Those posts focused almost entirely on the act of acquiring data, but it is also critically important that the acquired data is traceable. Hence, this post discusses data traceability.
What is Data Traceability?
Data traceability establishes confidence in an acquired set of data and gives the data integrity. It may include attributes such as:
• When the data was acquired
• Identification of what instrumentation was used to acquire the data (e.g., microphone model and serial numbers)
• Engineering units for the data as stored in the file
• Parameters associated with the stored data (e.g., number of power line cycles over which a DC voltage measurement was made and the associated power line frequency, the sample rate, the range setting, and the useable (alias-free) bandwidth of the data)
• Information regarding any overloads detected during the recording
• Calibration information: The sensitivity (value and units) that was used, or that needs to be used, to convert the stored data into engineering units (e.g., the value to use to convert as-stored voltages to Pascals for a microphone or shear stress sensor)
Depending upon the type of data that you are acquiring and the requirements that are placed upon the data that you are acquiring, there could be many additional attributes that are part of data traceability.
The above attributes are the type of attributes that typically should be stored along with the acquired data at the time it is acquired.
They are typically readily available, as they are integral to the process of acquiring data. This is certainly the case with parameters such as sample rate, range setting, and amount of data acquired. Other parameters, such as microphone model and serial number, may be stored in a separate file that pertains to many acquisition events, as long as there is a way to know which microphones were used for each individual acquisition event. These attributes that go along with the actual data are typically referred to as metadata, i.e., data that gives information about other data.
Another aspect of data traceability is metadata that gives context to the acquired data. Contextual metadata doesn’t typically pertain to how or when the data was acquired, but rather under what conditions it was acquired.
If we don’t know the context of the data, then we don’t know how to analyze it (i.e., interpret it or explain it).
Contextual metadata may include such attributes as device under test (DUT) configuration, flow speed, temperature, humidity, model attitude, and ambient pressure. Contextual metadata data may be stored along with the data, but in some cases, it is needed by many different measurement systems and/or some other measurement system is already capturing the contextual metadata.
Minimally, an acquired data set needs to be uniquely associated with the proper contextual metadata.
This may be accomplished by storing a unique identifier (text string or number) with each data set that associates the data set with the pertinent contextual metadata. The identifier is dependent upon the test conventions used by the test organization that is responsible for the test. Typically, the unique identifier may be a run number or a combination of a run number and a test point. Sometimes, instead of storing a unique value with the metadata, the name of the data acquisition file itself can be leveraged as the unique identifier that associates the data with its contextual metadata.
Approaches to Associating Metadata with Acquired Data
As alluded to earlier, there are multiple options for associating metadata with acquired data.
The metadata that identifies how and when the data was acquired is typically stored with the acquired data, either in the same file or in a companion file that is generated at the same time that the file for the acquired data is generated.
Note that we are using the term “file” generically to represent whatever storage mechanism is being used for the data. It could be a flat file, a database, or whatever else may be pertinent. Some of this metadata, such as the instrumentation identification (e.g., model and serial number) that stays static or changes infrequently throughout a test campaign, may be stored in a separate file that associates this more comprehensive identifying information with something more generic like the channel number of the data acquisition system. Then, as long as the channel number of the data acquisition system is stored with the rest of the metadata that is stored with the acquired data, the instrumentation identification information may be retrieved from this separate file.
The variety of methods for associating contextual metadata with acquired data is diverse. In the end, what is most important is that you have a bulletproof way of connecting your acquired data with the pertinent contextual metadata.
The two ends of the spectrum for accomplishing this are:
1) a “minimalist approach”, where none of the contextual metadata is stored with the acquired data; and
2) a “kitchen-sink approach”, where all of the contextual metadata is stored with the acquired data.
In the minimalist approach, the data acquisition operator does not need to be concerned about the contextual metadata other than providing some kind of identifying information that uniquely connects the acquired data to the metadata such as a file naming convention.
This approach has the benefit of enabling the data acquisition operator to focus on the immediate task of acquiring good data without being distracted from that task by the demands associated with contextual metadata acquisition/entry.
It also simplifies the requirements associated with building the data acquisition application. This approach requires that someone else is keeping track of the contextual metadata and that the metadata will continue to be accessible for the duration that the acquired data is intended to be kept. An example of where this approach might be desirable is for a specialized data acquisition capability that is used in many different test facilities. In this case, each of the facilities likely has their own method for acquiring and storing contextual metadata. A wind tunnel, for example, likely has a system for controlling the tunnel and acquiring tunnel and model operational data such as Mach number, temperature, and model angle of attack.
To get all of the pertinent contextual metadata stored with the acquired data in the kitchen-sink approach, provisions must be made in both the data acquisition application and the data acquisition operator processes to ensure all necessary information gets captured.
The obvious advantage of this approach is that the metadata is, by default, associated with the acquired data because it is stored with it. The data set stands alone.
An example of where this approach might be desirable is for a data acquisition system where the acquired data is always part of the operation of a test facility. Consider a wind tunnel equipped with a force balance that is used to directly measure aerodynamic forces (lift, drag, and side) and moments (roll, pitch, and yaw) on the attached model. In this example, the operation of the wind tunnel is always coupled with operation of the model position and measurement of the forces and moments. Other measurements such as model attitude, thrust, wind speed, air temperature, and static pressure may also be primary test facility measurements. A data acquisition system that acquires all of these parameters along with all other information pertinent to the test, such as model configuration, has captured all of the information necessary for the analysis of the measured parameters. That is, it has the acquired data along with the metadata that identifies how and when the acquired data was measured, and it has all of the contextual metadata that is needed for analyzing the acquired data. For example, forces and moments can be associated with model configuration, model position, and flow speed.
An acoustic array measurement system, temporarily installed in this same wind tunnel, to locate and measure noise sources emanating from the model typically won’t have direct access to the data collected by the wind tunnel data acquisition system, but the wind tunnel data is necessary metadata for processing and analyzing the acoustic array data. For example, the flow speed and static temperature are required to process the acoustic array data to generate noise source location maps and the model configuration is needed to associate the noise source location maps with the physical situation to which they pertain. All the acoustic array measurement system needs is a way to associate each data acquisition event with the wind tunnel data that is pertinent to said acquisition event. This provides an example of a minimalist approach for associating acquired data with metadata (acoustic array measurement system) alongside a kitchen sink approach for doing the same (wind tunnel measurement system).
So, How Do You Know You Are Getting Good Data?
If you have read the previous three posts in this series, you should have been anticipating this question at the end of this post. However, the previous posts focused on one aspect of getting good data, such as acquiring high-quality fluctuating data or acquiring high-quality static data. In this post we have discussed the final piece of the overall task of getting good data, and we can answer the question comprehensively. You know you are getting good data because:
1) you are following all of the principles from the second post in this series for acquiring high-quality fluctuating data from all of your sensors to which that applies (see final paragraph from the second post);
2) you are following all of the principles from the third post in this series for acquiring high-quality static data from all of your sensors to which that applies (see final paragraph from the third post);
3) you are associating all of the metadata with the acquired data that is required to document how and when the data was acquired and how to convert the data to engineering units; and
4) you are associating all of the contextual metadata that is required to document the conditions under which the data was acquired (its context) so that the data can be analyzed.
Caveat: Recall in the second post of this series that we scoped the discussion so that we could focus on how you make sure that you are collecting good data once everything is in place. We did not focus on topics such as the quality of the sensor, the capabilities of the data system, the uncertainty associated with the calibration data for the sensors, what sample rate to use, or how much data to collect. All of these topics, and more, are necessary to consider in advance, so that once you have collected the data, it can be analyzed and give you the results that you need. For example, if you don’t sample at a high enough rate, don’t collect data for a long enough period of time for each acquisition event, or don’t have adequate confidence limits in your calibration data, you won’t be able to produce the results that you need when analyzing the data.
For the next post in this series, we will explore a case study, the IC2 Complete Shear Stress Measurement System, to show how it supports all of the above-mentioned principles for getting good data.