Reporters' Roundtable Ep. 121: Wavii founder on the future of news
Reporters' Roundtable Ep. 121: Wavii founder on the future of news
2012-04-24
hey guys it's rafe needleman in San
Francisco welcome to reporters
roundtable hey last week I covered a
really interesting company called wavy
wa VII and I just I thought the company
was fascinating it is a semantic
analysis product that takes news of the
world all the news is coming in and
gives you a facebook like feed of what
businesses and people and events are
happening what things are going on so
and so bought this so-and-so a company
was acquired by such and such and the
system that does it is as far as I can
tell magic it's a computer that reads
the news and parses it for you and I
just think that's fascinating and
because I think it's fascinating i have
the CEO of the company adrian a loon
here with us to talk about his business
wavy and what it's doing why it's doing
it and the future of computers that read
the news so you don't have to adrian
thanks for making the time to join us on
the round table it's a pleasure thanks
for having me so give us a brief pitch
of what this business is that is and
then let's get into the the science in
the future here what is wavy and why
does it exist yeah sure so you can kind
of understand wavy it's probably easiest
to put it in context of something that
we all know which is Facebook right and
what facebook kind of does so well is
Facebook just gives you these quick like
three second visual updates about what's
going on in the world right like Bob
checked in to a location you know julie
has a new job eric is now dating someone
and those nice visual updates are just
really really pleasant to consume right
there fast I get to decide what I want
to click into so if I see that you know
you Rafe checked into a restaurant and
you've got some photos well I get to sit
there and say hey I want to click into
these photos and learn a little more or
I get to say hey you know I'm just not
that interested or let's say you start
dating someone new well I get to click
on her and go find out more about her
who do I know in common with her where
she work so Facebook kind of gives us
this
nice interface of exploring the
information about our friends and it's
it's incredibly good at keeping us
up-to-date with all of our friends right
like in a few minutes a day you can keep
up to date on God knows a thousand
people or 1,500 or 2,000 people and so
we basically a wavy just wanted that
same product but for the world right I
just want to keep up to date with
everything that's going on in the world
in these nice little visual updates
where I can kind of click around and
explore and so that's what we set out to
do the challenge there is that if you
think about it let's say I get in my
feet Rafe is dating Julie or you know
sorry I don't know your wife's name um
and maybe I get some photos of you and
Julie below it the way Facebook was able
to give me that is that you know maybe
you filled out a relationship status and
your wife kind of confirmed it and maybe
you know one of your friends was there
taking some photos and maybe some one of
your other friends was at home kind of
OC these photos let's tag them yeah so
are its all very structured and you're
talking about applying structure to
other stuff exactly but the trick for us
is unfortunately that structure isn't
being provided by users for everything
going on in the web right it's not like
when facebook bought instagram they
didn't change their relationship status
it's not like when Whitney Houston died
she didn't check into her death you know
there's there's basically tons of data
out there um that's not in a format the
computer understands it's in a tweet
it's in a blog post it's in an article
maybe it's in a YouTube video and so
what we're doing is we're teaching the
computer to basically read and
understand that much the same way a
human would and then build these feed
items that kind of represent what
occurred if that makes any sense hmm and
and the product then is like a feed of
the news now why do we need this I mean
I in my field in technology I read news
com of course I scan a couple of blogs
and RSS feeds and I look at tech meme
why do I need something that is giving
me kind of a timeline view of the news
when the news is all there right in
front of me anyway yeah so the first
aspect is it's just hard to get through
a lot of the news today right
so when facebook bought instagram it
came into my feed you know 2,000 times
right it's in my RSS it's it's being
posted in five different places and so
what happens is first it's hard to
consume that because maybe there's
different fragments to the story over in
this article it talks about maybe the
purchase price and this article talks
about how much Kevin made in this
article talks a little about the
background so you really kind of want an
almost aggregate single view of hey here
are all the details that have been
pulled out of these things but there's a
second aspect too which is things that
maybe you care about that aren't so so
large in the news maybe your local ice
cream shop releases a new flavor and you
know what the local paper covered it in
two sentences well that sort of stuff
just gets buried right and so to an
extent we also want to kind of surface
that information to the people that care
about it but then there's a third aspect
too which is just putting things in
context right so as I mentioned if
you're dating someone i can click on
that person on facebook and i can go
find out about her and when you're
reading a news article today you just
don't have that ability to kind of
really get context or analyze
information really you only get kind of
a singular viewpoint at each point in
time think of something just really
really simple happens to us all the time
right let's say i want to know when
Apple's releasing the next iphone and
i'm pretty sure everybody wants to know
this right so so what do i do well I can
read maybe your article about it ray for
I can read something on who knows some
other site but wouldn't it be great if I
just had some visualization that was
like look 14 sites said it's coming out
in april nine sites said it's coming out
in june and here's the timeline of when
they've released them historically and
even better would be hey typically these
seven sides are the accurate ones and
look at what they're say they're saying
june so you probably want to focus on
that so we believe that kind of giving
users access to more information and
kind of visualizing that information is
is a very compelling experience now i
don't think it replaces your news com i
don't think it replaces your technique i
think at the end of the day people stay
up to date at many different sites and
that's okay
we just want to provide kind of a set of
information and experiences that users
aren't getting elsewhere that are
valuable so when I saw this product I
was very impressed by the demo here and
I've been using it and is still very
impressive technology what it does is it
looks at all this information based on
stuff that you say you're interested in
and we'll talk about that later and it
distills the headlines in some cases of
better headlines than the original
articles and puts them in front of you I
that seems um fairly magical how do you
how do you do that how do you figure out
that a a story written by me and our
terrible rush is actually about such and
such buying so and so or Google Mail
being down or something like that sure
so I don't have my wizardry degree so
it's not actually magic though it would
probably have been faster to produce if
it was um so think about think about how
children learn right or think about how
you learned when you were very young
maybe maybe kind of one day your your
father comes home and he says you know
what I want a new watch and then maybe
you know later that day your mother
comes home and she says cat I really
want to massage and so what you're doing
is a little kid is your kind of hearing
tons of language and then you're
starting to discern some patterns and so
the pattern in this case is maybe you're
hearing whom I want us something I want
us something and you're you're kind of
gonna figure out what does that really
mean and maybe you'll ask someone maybe
the kid will go up to his cousin and say
hey what does it mean that to want
something and then the kid learns what
wanting really needs and now the kid is
kind of like because he's learned the
patterns the kid can go ahead and he can
try it himself so you can be like well I
want a sandwich or I want a toy and the
kids kind of learned at that concept
from now until the dawn of time well
this is how we train our system so our
system is looking at all language you
know it's a big data problem it's just
kind of ingesting tons of content off
the web and then our system is using
machine learning to discern patterns and
so it's an a lot of these things they
all look the same and it'll maybe show
it
one of us but oh those are engagements
and so we'll just tell it oh that's
that's an engagement when two people
kind of you know get engaged it leads to
marriage give it a couple details about
it and then the system will go off and
it'll start up it'll start trying its
knowledge much like a little kid and
it'll say okay well that means you know
this actor got engaged to this actor
this politician got engaged this policy
and kind of goes through and it's
probably guessing pretty correctly most
of the time and then maybe it makes a
mistake like it says Barack Obama is
engaged to Ahmadinejad and we're going
wait wait wait wait wait how'd this
happened and maybe what it saw was
Barack Obama was engaged in a heated
debate with Achmed amjad and so what
will happen there is will tell the
system no it'll figure out once again
using machine learning it'll figure out
that being engaged to someone and being
engaged in a heated debate with someone
are kind of two different things and so
it's kind of honing its knowledge over
time much like a little kid is where a
little kid may say I want a happy and
you're you know you'll correct that
little kid no you can say I want an
object and the kid gets better and
better over time well that's what our
system is doing it's getting better and
better over time now people have had
theories about this for probably
hundreds of years about the linguistics
about meaning about this the
deconstructing language and human beings
of course even you know my five-year-old
has or before you know when he was
learning language has trillions of
neurons and interconnections and our
brains work fundamentally differently
from the way binary computers work no
matter what the programming is uh there
has got to be precedent here that you
have and past that scientists and
linguists have gone down that have been
wrong and past that have been right what
are you basing your technology on sure
so the first and foremost thing to
understand is that at the end of the day
we haven't built Skynet our computer
can't think our computer at the end of
the day is just running some math where
it's recognizing patterns and
representing that information now I take
it as a compliment that you know a lot
of people look at our product and say
there you know there must be something
going on there that really is almost
artificial intelligence but it's not
it's real
just a first step along the long long
long path of really building full
artificial intelligence now what people
have tried to do in the past when
teaching computers language and this is
somewhat a generalization but what
they've tried to do is basically teach
the computer what you learned when you
were in kind of third grade fourth grade
fifth grade which is around teaching the
computer the rules of grammar so this is
a subject this is a verb this is an
object this is a past participle you
remember like all all that kind of those
classes that you took and all that
information you learned what we found is
that it doesn't really make sense to try
teaching language that way because
that's not really how humans learn I
mean think about it a three-year-old
speaks when your when your child was
three he or she was speaking but at the
end of the day that three-year-old
doesn't know what a verb is or at least
if they do they're smarter than I was at
that age and so what we found was that
those the grammar rules that we learn in
school are very much an attempt at
retrofitting rules on top of language
and much the same way where you didn't
learn all language in kind of your third
fourth and fifth grade class and and by
the time you were done with fifth grade
you weren't saying oh I now know a
hundred percent of everything you're
actually learning concept at a time
right I'm teaching you a concept right
now about you know machine learning and
artificial intelligence and maybe later
today you're gonna teach me a concept
about journalism right and so these
concepts that we learn it's a
never-ending process we're gonna learn
them until the you know until the day we
die and that's how we decided to teach
our computer rather than just trying to
train all the rules up front once and
for all I you you come to this field
honestly I believe tell us about your
upbringing and the lineage of what
you've been doing here back to some of
the great linguists of our time yes so
I've been kind of surrounded by
linguistics it's that it's that thorn in
the side that keeps like poking me since
I was um since I was really young and
the reason is because my my father is a
linguist he studied at MIT under Chomsky
and so I was constantly so
founded by my father and his friends and
one of the things that's great about MIT
is this culture of debate where they're
constantly constantly arguing about this
language structure that language
structure so what happened is when I was
growing up we just sit around the dinner
table or you know my dad would invite
his friends over for drinks and they'd
be arguing on various kind of aspects of
of these rules which rule is correct is
it this rule or this rule which covers
more cases and what's funny is they
would often just kind of turn to me as
the the young child and say well which
do you think is right because there's a
degree to which intuition and Native
understanding of language is often more
accurate than over analyzing languages
now I never wanted to be a linguist I
still don't want to be a linguist but
what what that experience taught me was
that language you know I just spent a
lot of time thinking about language and
what it taught me is that language just
isn't that hard and by that I don't mean
that humans can't speak with incredible
incredible incredibly difficult
sentences and structures but what I mean
is that we've all learned it fairly
intuitively and so what that tells me is
look if two year olds can speak we can
at least get the computer to understand
language at the two-year-old level or at
the three-year-old level now I don't
think we're gonna do it at the 50 year
old level but it certainly taught me
that that language has some core
elements to it that are fairly simple
and combine that with my focus more
recently on trying to unlock meaning on
the web you know for better for worse
the only way we're really gonna get I
mean think about the experience at
facebook gives you they've unlocked all
the meaning about your friends I can
figure out which you know which of my
friends live in Seattle I can figure out
which of my friends work at this company
which have been to a restaurant who's
dating who I mean they've really
unlocked all of the information about my
friends but if we want that same thing
for the internet all of the information
on the internet for better for worse is
in natural language if I want to know
who are all the celebrities that had
dui's in 2008 and I'm not saying i do
but let's pretend um how can I get that
well you know you got to think that
Perez Hilton's working all day long to
make sure that contents on the web it's
in tweet
blog post if I want to know what are all
the series a evaluations that content is
all on the web but the computer can't
get at it because the computer doesn't
understand natural language so from our
perspective that was kind of the cost of
doing business if we want to build this
product we have to teach the computer to
understand language and for better for
worse we didn't back down merely because
we were kind of comfortable with
language on we've been surrounded by it
for a while and so it wasn't it wasn't
that scary to kind of tackle the problem
uh there are other things that signal
importance to people in addition to the
facts that you so far doing a pretty
good job of extracting from unstructured
articles and one of those signals is the
social signal if my friends read a story
then that's arguably important to me if
I say I'm interested in something by
retweeting it on Twitter then that can
be picked up as a signal how does that
play into the what the user of wavy sees
because one of the things about wavy
just as a little side note here is the
the display of the wavy page is not
information dense in the same way that a
New York Times front page or a tech main
page is where there's like a hundred
headlines on wavy you see a stream and
it's far fewer numbers of stories how do
you decide what the user sees sure so
there's two aspects here there's the
deciding helm which items to show the
user and then there's a second aspect of
deciding how much to show the user and
so deciding which items to show a user
really comes down to kind of I'd say
three main aspects the first is kind of
um what we see in the world and by this
I mean how often are we seeing something
and like how rapidly are we seeing it so
we may see something often for example
that Apple's release you know rumor that
apple is releasing a new iPhone we see
it all the time but we don't see it very
rapidly in one kind of big spurt right
we see it kind of all the time on and
just occasionally um so so that's kind
of the first aspect is just what we
think
his world heat the second aspect that we
think of is just a priori knowledge of
the concept we know that a death is more
interesting than a birth almost always
we also know for example that acquiring
a company for a billion dollars when
you've only ever acquired companies in
the past for tens of millions is
actually a really big deal and so
because we have this underlying
information we can play some extra kind
of analysis games to really make sure
we're surfacing relevant content now the
third aspect has to do with the user and
their world and so what this comes down
to is what what is the user following in
our system right so are you following
Barack Obama and is this something
happening with Barack Obama well if it's
something happening with brock obama is
probably pretty relevant to you or are
you following michelle obama but it's
something happening with barack obama so
it's maybe slightly further away from
your interest but it also might be
relevant to you um or in the past when
we've shown you things about Barack
Obama have you clicked on them or not
and then the portion that you're
discussing which is how much are your
friends engaging with this piece of
content are your friends all liking and
commenting reading well if so that's
also going to build heat in our system
so so that all goes into building a
rancor that decides hey what's the most
important thing to show the user now the
second aspect is just how much do we
show the user and you know do we give
you ten items do we give you a hundred
items we give you a thousand feet items
and this is a trade off of what's known
as accuracy and recall the fewer the
items we show you though the less recall
well pretty typically the better
accuracy and by that I mean they're
going to be things you care about far
far far more right so if I just showed
you one feed item facebook bought
instagram well the accuracy is really
high you're probably gonna care about it
the recall is pretty low right you know
you're there's tons of things you missed
and so this is a game we're constantly
playing we don't want to give too much
recall and kind of inundate the user
with tons of things they don't care
about but we also don't want them to
miss things so what were what we're
doing what exists in the product today
but we're also
constantly working on is basically
trying to look at how the user engages
our content and figure out on a per-user
basis does this user want more content
as this user want less content and so
this is something that you see at
Facebook the more often you come to
Facebook the more stuff they give you in
your stream or in your feed but it may
be less and less interesting stuff over
time because they run out of the good
stuff and we have the same notion and
finally what is the role in your
estimation as the reluctant linguist the
role for article writers who are putting
their blood sweat and tears into
crafting there were into gathering their
information gathering the opinion and
crafting it together what is the role of
the writer yeah so think of this kind of
from probably three perspectives the
first perspective is on at the beginning
at the very beginning like what was a
reporter doing they were reporting right
so it's breaking the news and our system
needs that our system wants that we're
not looking to replace that right so the
first kind of one or two people that
break the news that's particularly
interesting to us now the fact that
right now many many many articles
printed on the web are just kind of
duplicates in terms of the actual what
happened that's less interesting so we'd
like to take all the you know the 50
articles that say Facebook buttons are
going to merge that into kind of 11 feet
Adam but what is interesting let's think
about the second aspect when you're
reporting something Rafe maybe you
didn't break it but what are you doing
you're adding your analysis your opinion
your context and that's extremely
valuable and I do think that 50 people
adding their opinion and analysis is is
is still something that we don't want on
the you know kind of worldwide we don't
want that to change we want people to do
that so what we'd love to do is in our
perfect world in our system would say
here are Rafe's thoughts and if you
click on that hey now I dive into raves
article to kind of learn a little more
about it now there's a third aspect too
and this has to do with the fact that we
often we often approximate our interest
by sources
probably the best way to put it and what
what i mean by that is i'm interested in
tact I don't necessarily know what in
tech I'm interested in so instead I
follow what Rafe rights and rape is good
at kind of introducing me to things it's
almost a serendipitous experience of
discovery and we think that's extremely
valuable to and that's why over time we
want people to kind of have we want our
users to have as much control as they
want to do that if they want to follow
the reporter great if they want to
follow the company or the product or you
know the story type namely an
acquisition great um we basically want
to give users kind of the the ultimate
control over that and let them define
their experience if that makes sense
okay hey Adrian thank you so much for
making the time for us today a very
interesting product you guys have to
check out wavy wav I icom is also a
mobile app for it and I just want to
point out for people who or have gotten
this far this is a very interesting
space to be in from a business
perspective the news aggregator site was
acquired by CNN power set was acquired
the search company the power set was
acquired by microsoft i don't know
what's gonna happen with wavy but get it
while you still can and is still
independent this is a really fascinating
company to watch adrian good luck to you
and thanks for the time thank you very
much for a favor really appreciate it
you
We are a participant in the Amazon Services LLC Associates Program, an affiliate advertising program designed to provide a means for us to earn fees by linking to Amazon.com and affiliated sites.