Data Warehousing: An Aid to Decision-Making
In education, we are constantly looking for the magic bullet - that intervention which will miraculously result in higher achievement scores for our students, happier and more productive teachers, etc. The pressure for improvement has caused education to become a “bandwagon” industry. Therefore, educational leaders adopt the latest fad in hopes that it will bring them to the promised land with little or no budget increase. This has been referred to as the LYNT-TYNT-NYNT Syndrome, which translates into Last Year’s New Thing - This Year’s New Thing - Next Year’s New Thing (Markowitz and Elovitz 2001). All too often these “new things” represent a decision-maker’s interest, and bear little if any connection to the programs of previous years or to the educational needs of the school or district. However, with the advent of No Child Left Behind and the public’s demand for greater accountability, this is no longer acceptable. Thus, the answer lies in the ability of educational leaders to collect, store, analyze and effectively utilize large amounts of data to inform their decision-making. This is where data warehousing can help.
Han and Kamber (2001) define a data warehouse as “a repository of information collected from multiple sources, stored under a unified scheme, and which usually resides at a single site.” In educational terms, all past information available in electronic format about a school or district such as budget, payroll, student achievement and demographics is stored in one location where it can be accessed using a single set of inquiry tools.
Two factors make this process different from a typical data processing environment in a school or district. First, while the data warehouse contains historical information, it d'es not contain current-year information. Using current-year data is not recommended because it has not been cleansed to eliminate incompatibilities. Also, true data mining is computer intensive and usually not conducted in a day-to-day production environment. A second difference is that information from the various databases is extracted and then interrelated to ensure compatibility for ease of analysis.
Databases are found in the normal school environment; however, they exist as separate entities, often in different computer languages that require the knowledge of various types of report-generating software. In a data warehouse, the integration of the information from these databases allows new queries (questions) to be investigated. For example, when a district adopts a new textbook series, a properly constructed data warehouse makes it possible to compare the impact of that purchase on achievement with relative ease. Generally, the information about the purchase and student achievement would be stored in existing but separate databases.
After the required information is passed to the data warehouse, it can be extracted using a simple set of inquiry tools. Using the same warehouse, a historical comparison of success on standardized tests can be disaggregated by grade, race, sex or teacher. This potential for extracting information is amazing. The appeal of this process is that decisions could be based upon facts rather than suppositions, dubious beliefs or even rumors that have formed the basis for many past administrative decisions.
The concept of data-driven decision-making is as simple as it sounds. Collecting data, refining the data into a usable format, and basing decisions on the information is the essential concept. Historically, collecting data has not been hard to accomplish in schools. Interpreting and processing data so it can be considered in the decision-making process has been the difficult part. When computers first became available, individuals were needed to write custom programs to pull the data from student record systems. Unfortunately, this data extraction did not generally yield information in a useful format. Another program or set of programs was then needed to manipulate the data so that it could be useful. The steps were numerous, time-consuming, expensive, and required several people with experience in different languages or programs.
What has changed recently, and will continue to change, is the ease with which data can be extracted, processed and analyzed. The development of improved hardware and software tools to query advanced database systems has made it easier (not easy) to collect and present data in a useful format.
8 Basic Steps to Establishing a Data Warehouse
- Find the appropriate software, hardware and people to convert existing historical data into a useful format where, according to current terminology, data can be “mined.”
- Select the data sources to be used. These are usually stand-alone databases containing information in areas such as budget, student achievement and standardized-test results.
- Determine what data is important. While conceptually there is no limit to the amount of information that can be included, there are practical limits to what information decision-makers will want to examine and how much effort they want to put into the most difficult phase, which is data cleansing.
- Convert the information in stand-alone systems to a format that can be queried by OLAP (online analytical processing).
- Cleanse the data. This is perhaps the most time-consuming part of the process. It is amazing how much bad information can exist in a database. For example, when incomplete grades are not made up, the student record may still contain an “I” even though the extracted data is expected to be in a numerical format. The board policy may indicate that all incomplete grades be changed to 59 if they are not made up in the required time, but many never get updated. An error rate of 0.5% means a lot of corrections when dealing with millions of pieces of data. Even worse is that each error must be individually researched to resolve the problem.
- Bring the data into the warehouse. This is a very important step because it determines what links will exist between data fields. For example, if individual scores on standardized tests are brought in as part of a student’s record (unless the test date was also tied to this record), one would be able to run a correlation of student class rank and standardized test score, but would not be able to compute a simple average test score by year. The desired relationships should be considered carefully, and the original data should never be destroyed lest it be discovered that something was missed.
- Determine who will have access to the data. Much of the data is confidential and cannot be made available universally to all members of the school community. If planned for in the beginning, levels of access can be controlled by password to protect the rights of individuals.
- Determine the toolset (software) that you will use for inquiries and analysis of this data. Ease of use and standardization are the keys to proper selection.
Uses for a Data Warehouse
A properly designed data warehouse can help provide educational leaders with information to aid in decision-making. The research possibilities are almost endless. In addition to the more common statistical information, mean, median, etc., administrators can evaluate the results of their decision. When the budget information is tied to student records, one can actually determine if the cost of the new textbooks purchased last year produced a change in student achievement. Is there a link between staff salaries and achievement? Are the students from one program or school doing better in their sophomore year than students in another? Do the students from the new elementary school perform differently at the high school level than students from other schools? Is teacher absenteeism a factor in achievement? The ease with which these questions can be asked and answered is the key to true data-driven decision-making.
The control of independent variables has always been a difficult task in educational research. With the amount of data available in a properly designed data warehouse, this problem is greatly simplified. Demographic data such as race, gender, age, etc. are stored in the data warehouse along with achievement measures, attendance and other school-related data. With the use of the correct query, it is a simple matter to disaggregate results based upon any set of variables. This is a tremendous research benefit.
A requirement of NCLB is that student achievement results be disaggregated by grade, race and sex; a data warehouse makes this a simple task. Because the databases of information are interrelated in a data warehouse, these comparisons are made possible. In a 2002 article from ENC Focus (www.enc.org/features/focus/archive/data/document.shtm?input=FOC-003001-index), Carol Damian writes, “Setting goals for the support of gathering, studying, and using data must be a school- or district-wide priority.” Establishing a data warehouse, converting already stored data, and capturing new data to continually build the knowledge base in the data warehouse would meet that need and be of great help to the educational decision-maker.
Han, J., and M. Kamber. 2001. Data Mining: Concepts and Techniques. San Diego: Morgan Kaufman.
Markowitz, A., and L. Elovitz. 2001. “Improving Staff Development Through Action Research.” Journal of the New Jersey Association of Teacher Educators 10 (1).
This article originally appeared in the 04/01/2005 issue of THE Journal.