April 2005 — Features
Print this article | Email this articleClick here to receive your FREE subscription to T.H.E. Journal
Data Warehousing: An Aid to Decision-Making
8 Basic Steps to Establishing a Data Warehouse
- Find the appropriate software, hardware and people to convert existing historical data into a useful format where, according to current terminology, data can be “mined.”
- Select the data sources to be used. These are usually stand-alone databases containing information in areas such as budget, student achievement and standardized-test results.
- Determine what data is important. While conceptually there is no limit to the amount of information that can be included, there are practical limits to what information decision-makers will want to examine and how much effort they want to put into the most difficult phase, which is data cleansing.
- Convert the information in stand-alone systems to a format that can be queried by OLAP (online analytical processing).
- Cleanse the data. This is perhaps the most time-consuming part of the process. It is amazing how much bad information can exist in a database. For example, when incomplete grades are not made up, the student record may still contain an “I” even though the extracted data is expected to be in a numerical format. The board policy may indicate that all incomplete grades be changed to 59 if they are not made up in the required time, but many never get updated. An error rate of 0.5% means a lot of corrections when dealing with millions of pieces of data. Even worse is that each error must be individually researched to resolve the problem.
- Bring the data into the warehouse. This is a very important step because it determines what links will exist between data fields. For example, if individual scores on standardized tests are brought in as part of a student’s record (unless the test date was also tied to this record), one would be able to run a correlation of student class rank and standardized test score, but would not be able to compute a simple average test score by year. The desired relationships should be considered carefully, and the original data should never be destroyed lest it be discovered that something was missed.
- Determine who will have access to the data. Much of the data is confidential and cannot be made available universally to all members of the school community. If planned for in the beginning, levels of access can be controlled by password to protect the rights of individuals.
- Determine the toolset (software) that you will use for inquiries and analysis of this data. Ease of use and standardization are the keys to proper selection.