Student Data | Feature

Data Exchange Beyond SIF

A New York district consortium had to figure out a manageable approach for data exchange that didn't require heavy lifting every time.

Imagine that a school in your district has just registered a new second grader in the morning, then a snowstorm strikes just before lunch. How likely is it that the parent contact details for that student will have already ended up in the emergency notification system by the time the principal activates phone calls to notify parents? That's the type of scary scenario that troubles Joe Fitzgerald, systems integration manager for the Lower Hudson Regional Information Center.

This center, one of 12 in the state of New York, provides educational and administrative technology services to 60 school districts in three counties north of New York City. The center provides a menu of services to districts, including Internet access, data warehousing, student information system support, and the like.

Fitzgerald's piece of the problem is to figure out how to make sure the data entered into the student information system (SIS) provided by the parent during that early morning registration process is seamlessly shuttled over to the other systems in use at that school. So far, the results have been hit and miss, relying on manual and semi-automated methods of data exchange.

But the challenge doesn't end with a single school, a single SIS, or a single notification system. He has to sort out an approach that will work with all 60 districts that are part of his service center. Furthermore, it needs to work among four different student information systems and three notification services because each district or school has a different combination of applications in use. Finally, the mix-and-match nature doesn't stop there. There are multiple applications being supported by the consortium for food services, transportation, and messaging. "Any one district can have a combination of those," Fitzgerald said.

SIF Isn't the Final Answer
The Schools Interoperability Framework, or SIF, provides a specification that school applications can use to share data. SIF defines standard formats for shared data, naming conventions for that data, and rules of interaction among applications. The SIF framework relies on the use of SIF zones, logical groupings of applications that communicate through a hub or zone integration server (ZIS). A single ZIS can manage multiple zones. But setting up those zone integration servers for an environment like a consortium that's supporting numerous combinations of applications becomes incredibly complex. There's a host of changes precipitated when the database schema changes.

That's the problem with simply relying on SIF as the sole means for data sharing, said Fitzgerald. "It's like going to a store for a quart of milk and driving there with a Mac truck. It's way too complex and expensive for an individual district to manage or support." For state reporting, it makes a lot of sense, he noted. "It's most economical on a large state-wide scale, where there are fewer points of integration. It's big and simple." But to operate effectively as a consortium, the center needs to seek out scalable solutions that will work for the greatest number of districts.

The Data Exchange Demand
The demand for data exchange is only growing. "Literally the day the kid is enrolled, his records are showing up in the food services system and for transportation. He can't be left without lunch. He can't be left on the corner when his bus pulls away," Fitzgerald said.

Many of the schools in the consortium have also moved to automated systems for access control. Students are issued ID cards that can be monitored by a barcode reader in the doorway. High schools that have period by period attendance can automatically notify the front office or security staff as well as the student's parents when somebody leaves the school between periods and doesn't return when the next period begins. But that system will be only as effective as the information it's using to do the security monitoring.

"That has a major implication," he explained. "A school's average daily attendance will make a difference in how much it gets in aid. If you're allowing a lot of truancy or just outright skipping, it has impact."

Process of Elimination
The center needed something that would do the data exchange work while still delivering a solid value relative to the cost. An initial list of contenders was eliminated based solely on pricing, such as IBM's InfoSphere DataStage.

The team ran tests with each of three finalist products: Denodo from Denodo Technologies, Microsoft BizTalk Server, and Talend from a company of the same name.

Denodo uses a "mashup" approach, in which the data is published to a data store or staging area for distribution. Talend beat out Denodo, Fitzgerald explained, because the consortium decided that the mashup approach would add a layer of unneeded complexity to its work.

BizTalk was in the ballpark price-wise, but "way too complex for the level of personnel we have in our facility," Fitzgerald said. "We don't have C# programmers here. I'd rather deal with a user interface that has most of what we need." Also, he added, the technical team had great difficulty in getting the application, which is traditionally used to weave together business processes, to work with the consortium's data warehouse, which uses Oracle as its engine. "It was just a bear," he recalled.

Also, BizTalk didn't do batch processing easily, and that was an important evaluation criterion. "We still live in batch world," Fitzgerald observed. "It's only increasingly true that event-driven data exchange is important to schools. Historically, it's all been batch. We can't just abandon that. Yet we wanted to work with a single platform, which Talend allowed us to do."

Talend is an open source program that uses open source technology, including Java, Perl, and SQL, but that doesn't rely solely on a public community for its updates. The company has an R&D team that determines the product line's roadmap and fixes problems as they surface. The company offers a subscription-based data integration product (Talend Integration Suite) and a freely downloadable open source version (Talend Open Studio).

During a six-month proof-of-concept evaluation of Talend's open source version, the team found the interface fairly straightforward to work with, which meant they wouldn't have to go outside of the tool to write scripts, macros, or full-fledged programs for more complex data mapping. An expression builder within Talend provides a wizard for setting up the data mapping details.

First Usage: SIS and Emergency Alert Systems
The first live implementation of Talend has focused on the exchange of data between SISs and emergency alert systems, of which the center makes three available: Blackboard's Connect-ED, K12 Alerts, and SchoolMessenger.

This combination of applications was chosen because they address a major risk issue that the districts want to manage closely, Fitzgerald said. "The average district in our region probably has close to 3,000 students. In the course of a day a couple of phone numbers might change; in a week, maybe a couple of dozen. But by virtue of the fact that it's unpredictable, when an emergency occurs, if you can't reach a parent, a child may end up standing outside a school. If you've got it wrong, the consequences are really bad. At best, you could have unhappy parents and at worst a lawsuit."

Districts that are part of the Lower Hudson Region are encouraged to use their SIS as the final say on data, and that's the same source used to populate the notification services. Talend sits in the middle of the systems doing the data exchange, and the IT team uses it to do data field mapping to connect the fields in the SIS with the fields required by the notification systems. The reason that Talend is superior is due to the fact that we control both sides of the integration," Fitzgerald said. "We're not relying on the vendor to update changes to a SIF agent."

Of course, as Fitzgerald pointed out, "That mapping process might mean you need to transform some data. This is ETL on steroids." Extraction, transformation, and loading is the process by which most organizations prepare their data for use across applications. Usually, that's done on a batch basis, he added. But the use of Talend has made that process event-driven. When a student is added to a SIS, that kicks off the event that transforms the data and populates the new data into the notification system.

But the process isn't totally invisible to the districts involved. They still need to get their processes "squared away," Fitzgerald said. For instance, some districts don't use their SIS as the sole source of demographic data. "If you start propagating data and your real source is the transportation application for the kid's address, because the bus driver is most in touch with it, the next time we populate that transportation system from the SIS, we're going to overwrite good data. Good data governance is required."

So, over the course of the year, the center has been having conversations with contacts within each district to "sell" them on the use of the service. As Fitzgerald pointed out, his organization can't command a district to participate. "For us Talend is important. The district couldn't care less. What they're going to pay for is that they can count on the fact that they have real-time demographic and attendance data available throughout the district."

That effort points out the ultimate problem Fitzgerald is attempting to address with the adoption of Talend: to provide a simpler method to achieve data exchange that will work across the variety within 60 districts. "The whole idea of SIF is right on the money," he concluded. "The problem is that from our view of shared services, districts are too unique to employ this technology economically."

Whitepapers