Improving Usability for Data Entry
Data analysis results are worthless unless the data is correctly collected and entered without error.
This page describes a number of best practices to improve the accuracy of data collection and entry into a computerized system (spreadsheet, web data entry, database or other computer system).These tips are in three groups below: Verification of Data Entered, More Complex Strategies, and Avoiding Manual Data Entry. Most add logic in the data entry system to detect probable errors. This can be as simple as additional formulas in a spreadsheet or more formal programming for a database or web entry.
To have a positive influence on data quality, the data entry verification and validation has to be designed to support the person, not fight against them. In short, do not force the person to guess about missing data or semi-legible entries. ...
Best Expert Tip
It is easier, faster, and less error prone if the layout of the data sheet corresponds to the layout
of the data entry screen.
Verification of data entered
It is fastest to detect and correct errors as close to the source as possible. The following strategies can help the person entering the data to notice errors. These can be both data entry errors (typographical errors and other mistakes) as well as apparent problems with the source data (field notes, data forms, etc.).
Required Data
Check that data that is truly required is not missing. To achieve a higher level of data integrity, values entered for required field can be compared to “unknown” or similar terms entered in an attempt to circumvent the prohibition on missing or blank values. ...
Important Data
Less critical but important information should just provide a reminder to enter the data instead of preventing the data from being saved.
Expected Format
Any data collection form and the corresponding data entry screen should clearly indicate the format that is expected. A simple example is that month, day, and year are in the expected sequence. Is time expressed in 24-hour format or are "AM" and "PM" used? A more elaborate example is geographic locations. ...
List of Values
In many cases, data values will be from a pre-defined list of values. Examples include codes for location or site names (or identifiers), name (or ID) of person(s) making the observation, genus and species of living things observed, or US state abbreviations. It is better to avoid making the user have to type the code, risking typographical errors. ...
Range Checking - Strict
Range checks verify that values are between minimum and maximum valid values. Strict limits can be enforced for defined values, such as 60 minutes per hour. ...
Range Checking - Moderate
This is range checking for values that are not rigidly defined, but are defined by reasonable ranges. For example, you can expect the temperature of liquid water to be between 0 and 100 degrees Celsius since it would normally not be liquid outside that range. However, if impurities and air pressure are considered, liquid water can exist outside of the 0 to 100 range.
Range Checking - Relaxed
This is the most common category. You can set reasonable ranges and then provide a notification to the user during data entry when a value is outside that range. ...
Range Checking - Statistical
The range limits can be computed as some number of standard deviations from the mean value of prior observations. ...
Consistency Checking
Consistency checking compares the values entered for more than one data field. The benefit is these help verify both data items in a way that provides more benefits than checking either value alone. ...
More Complex Strategies
Check Digits
"Check digits" can be used to reduce errors when entering numeric unique identifies. Check digits are used in UPC (Universal Product Codes) on consumer products, credit card numbers, and the ISBN number for books. ...
Double entry
A more labor intensive data entry technique is to have two people independently enter the same data and have the computer compare the data. This is based on the assumption that both people are not likely to make the same typographical error. It also provides a second opinion for interpretation of cryptic handwriting.
Avoiding manual data entry into the computer
These strategies reduce the need for a person to enter (or re-enter data).
Direct Download
Directly download data from the collection instruments. For example, locations stored in a GPS, values in a water quality meter, data-logger content. This requires planning to be able to match the electronic data with a manual observation. For example, the name (or ID number) for the GPS location ("waypoint" or "point of interest (POI)" needs to be recorded on the data sheet along with the rest of the observations.
Scanners
Reduce manual typing by using scanners reading bar-codes, RFID (radio frequency identification) tags, PIT (Passive Integrated Transponder) tags or other similar approaches to label samples, data sheets, etc. Bar codes can be pre-printed on adhesive labels and used as needed. Be sure the adhesive and ink are resistant to field/lab conditions. Alcohol-based preservatives are notorious for dissolving pen ink and adhesives.