Sending AI Off to School
Can machine learning really free up time dedicated to grunt work, giving your educators more time for working with students?
- By Dian Schaffhauser
- 07/08/20
Plenty
of discussions about the use of artificial intelligence talk about
how AI could help educators by shrinking the amount of time they have
to spend on the trivia that pervades their work and freeing them up
to focus on the job of teaching. In the latest CoSN
IT leadership survey,
more than half of respondents (55 percent) said that AI would have a
significant or even transformational impact on teaching and learning
within the next five years, if privacy issues can be addressed to
everybody's satisfaction.
In
a recent
session delivered during the CoSN2020 virtual conference,
Girard Kelly gave a preview of what the use of AI could look like by
using an example from his own employer. Kelly works as counsel and
director of the privacy program at nonprofit Common
Sense Media.
The education
division of Common Sense
serves as a vetting tool to help teachers choose educational software
based on learning ratings, community ratings and privacy ratings.
It's that last category that's especially difficult to nail down,
Kelly said.
The
Problem with Privacy Policies
The
problem Common Sense faced was in trying to help teachers and
families make better decisions about the software they use, based on
the privacy policies published by the companies that produce those
programs. The privacy evaluation process is intense and has both
"qualitative and quantitative" aspects. The organization
has to evaluate whether it exists in the first place, whether the
privacy practices are transparent, whether there's advertising or
user tracking in the product, whether the company does data
collection and sells that data, and a whole bunch of other issues.
Common
Sense uses a three-tier rating system: Blue means pass (about 20
percent of products fall into this rating); red means fail (about 10
percent); and orange means proceed with caution (about 70 percent).
The
existence of the privacy rating in itself has already made an impact,
according to Kelly. Of the 750 products already evaluated, about half
of the companies have updated their policies based on the ratings
they've initially been given.
A
big part of the job of developing each rating involves having legal
experts sit for hours and read through the privacy policy posted by
the companies. Many of the policies run "dozens of pages long"
— some 50 to 60 pages — and they're full of legalese, Kelly
noted. "We found that nobody actually reads the privacy policies
because it's really hard and it takes a really long time."
Common
Sense wondered whether machine learning might help speed up the
privacy policy evaluation process by identifying the elements that
were essential in a well-written privacy policy.
Some
Machine Learning Terms Worth Understanding
As
Kelly explained, the very concept of AI covers a lot of ground,
encompassing machine learning, knowledge graphs and expert systems.
In the case of ed tech, machine learning dominates by a wide margin.
What distinguishes it from the other flavors of AI is "its
ability to modify itself when exposed to more data." The more
data that's fed into a machine learning model, the more accurate its
recommendations or findings.
Machine
learning itself has subsets too, primarily broken down into how
categorization — the task to be done — takes place.
"Supervised
learning" references the use of prior knowledge of the output.
The model already knows what output is desired, and AI is used to
sort out or classify the input that will lead to that output. Once
the model learns the formula, it can stop learning and do its job.
According to Kelly, supervised learning crops up in identity fraud
detection, image identification and weather forecasting, among other
applications. When you identify the squares in a reCAPTCHA that show
the crosswalk, for instance, you're helping the model learn how to
identify certain imagery in the data.
In
"unsupervised learning," you have the input data but you
don't know what the output should look like. The goal for the model
is to learn more about the data and reveal patterns to you. This
proves useful in recommendation systems, targeted marketing and big
data visualization, as examples.
There's
a gray area between those two approaches, which is called
"semi-supervised" machine learning. You have a lot of data
for input, and some of the data is identified. Human experts help the
model evolve by undertaking identification until it can learn to a
sufficient level of competency.
A
third category of machine learning is called "reinforcement
learning." This is the process by which a learning algorithm
(the "agent") learns how to learn, based on whether it gets
a positive or negative response: It loses the chess game or it wins.
There's no training data plugged in; the training happens as the
process runs. According to the experts, this most closely resembles
how humans learn too. (Think: small child and hot stove.)
Reinforcement learning shows up in gaming and products doing
real-time decision-making, performing robot navigation or undertaking
learning tasks.
Machine
Learning to the Rescue?
There
are two challenges to the effective use of machine learning for the
kind of work Common Sense had in mind:
-
First,
even humans "go back and forth" on various issues of
privacy policy interpretation. When the humans can't agree, "we
can't expect machines to help us," noted Kelly.
-
Second,
privacy policies don't follow a consistent order. Companies' privacy
practices aren't laid out in a standard format.
The
organization turned to the use of natural language processing and
"transformers" to do an initial assessment of privacy
policies. Transformers, pre-trained models, parse sentences and
paragraphs to highlight what would be relevant for human analysis.
For example, if the evaluators want to see specific language in a
privacy policy to grant the product a favorable score in a particular
category of the scoring rubric, possibly machine learning could help
them make that first pass. Then the humans could focus on the "hard
parts," not the straightforward contents.
However,
the results haven't always met the bar. As Kelly asserted, a lot of
the machine learning technology currently available relies on
keyword-based pattern matching, which isn't entirely accurate and
generates "too many false negatives."
For
example, the privacy policy might say that the company doesn't sell
data when it actually does:
"We
do not sell your data, except if you give us your consent by creating
an account with the service."
The
transformer might pick up the relevant part ("we do not sell
your data") and give a passing score, thereby ignoring the
caveat that a human evaluator would immediately recognize ("except
if you give consent...").
Or
it's possible that the policy doesn't say whether data is sold, even
though the company does sell it:
"We
only share your data with third parties for legitimate interest
purposes."
In
that case, suggested Kelly, what's a "legitimate interest"?
For companies, that's "to make money." While the human
evaluator would pick up on that, it's "something where automated
keyword-based systems won't catch it."
Hope
for the Future
Kelly
said the systems that Common Sense has tried out have had some
success, and the process of "growing the training data"
continues. There are a few cases where the machine learning is good
enough to answer some questions about the privacy policies under
evaluation. One example is, does the privacy policy reference an
effective date? Because that's the most structured (the policy
usually references "version" or "effective date"),
it's easy to detect.
Because
of the promise offered by AI, the organization intends to continue
its work with machine learning. That will probably involve developing
a "hybrid approach" that could reduce the amount of time
humans spend on the evaluation by automating some aspects of the
privacy policy evaluation and bring it down to "half time."
"But
there still needs to be the human touch," Kelly emphasized.
"There's got to be some human intervention," to continue
training the AI model."
Questions
to Ask
Kelly
offered a list of questions for school leaders to ask companies that
tout their programs' AI capabilities:
-
What
does the software really do, and where's the evidence that can
support that claim?
-
How
general or specific does the AI get? For example, can it "measure
general student outcomes or just a slice of them"?
-
If
AI can do the work better than humans, which humans are we talking
about, and how much better? What's the cost-benefit equation?
-
What
are the privacy or ethical risks to students involved in the use of
AI-enabled software?
-
What
populations don't exist in the training data, which could make for
less effectiveness or downright bias?
-
Is
there a demo that lets the school try the software using its own
data?
-
Can
the software be trained by the users? Can it work with other data
training sets or does the company have control over those aspects?