Veenome: The future of interacting with video is here
Veenome: The future of interacting with video is here
2012-03-13
hey everyone I'm Hollywood from cnet com
here at South by Southwest 2012 in
austin texas i'm brian Tong in here
today we have Kevin Leland name he's the
founder and CEO of vino and this is a
really cool video product we talk about
some of the stuff that we see video is
the future can you kind of describe to
people about what vino ms doing sure
it's it's really simple so vino just
tells you what is in a video right so
you can find out the stuff that's in a
video the product the people the brands
and then you can use that data for
commerce for advertising to target
advertising and for search and discovery
yeah and what do you see that's really
simple exactly I mean you guys have
built your own technology right to do
actual video scanning and pull out not
just objects but people the concept is
simple the machine in the back is not at
all right so at the end of the day you
know you're going to get data basically
on what's in your video but the process
to get there is actually really complex
yeah so it's gonna go through no I
didn't mean it yeah um so so basically
how we do this is we we take a whole
video let's say it's you know ten
seconds long that's 300 frames we find
the key frames which is where things are
changing by a certain percentage we then
take basically tags for each one of
those key frames and then we look at all
that data linearly and say okay what
kind of relationships can be find
amongst those tags say if there is you
know two tags that say car another one
that says Mercedes let's kind of mush
this together and call it a Mercedes car
and then provide more detailed tags and
that's really the keys if I just tell
you there's a car in the video it's on
it's neat but it's not that interesting
or useful because you know it's a car
you can see it's a carbon you might not
know what time it is now when I got a
chance to go to your website at least
play and you have examples of how this
technology watching where
it's amazing because you're literally
dragging your mouse around specific
items and when it changes for example
there's like a macbook air in the frame
it changes the iphone it text that now
is this is this there is there any human
interaction done with the scanning of
these video files or if this is all you
know a computer brain is interpreting
these images and determining what's on
the screen so there is a human element
if you want want it right so the way
that this product works is it's really
it's a b2b kind of product in the sense
that I don't want to build a site that
people come to to watch videos right I
want to build a way for people to index
the videos and product eyes their videos
and product i'm only on this now so yeah
that's a good word yeah so there's a
hashtag hashtag product that's um we
don't really want to create this
destination site so we allow people to
basically take this technology and use
it on their own it so it's a platform
and so one that want to they can
actually brand object on their own and
so if we're here and we're you know I
know what kind of jeans I'm wearing and
it's my video in my sight I can say you
know these are Levi jeans in the case
that we can't identify them so there is
an option to kind of manually correct
and thus far the people that were
talking to you about this aspects
potentially about our API are really
interested in actually being able to
control that stuff for brand safety
reasons in for just the idea that you
know you might not be able to tell ever
exactly what kind of sweater I'm wearing
and so if you want to add manual detail
you can so you provide you have a
catalog of tags already right you just
are sucking up all kinds of data about
what a Mercedes looks like and that kind
of thing but you're saying like if a
business contracts with you then they
can write you mean that the problem with
it obviously is that the world of
possible things you can shoot your
videos infinite right and so there's no
way for us to always have everything
right and so we allow for that and
that's it can be used for custom control
to like if you have if you want to sort
of product placement and things like
that in your videos you can
actually I control that and say you know
just make all the pizza domino's pizza
so what are some potential uses like
Hummer seems like the obvious one right
I saw that scarf on a TV show how do you
see companies implementing this in a
consumer-facing way so originally I was
kind of like infatuated with this idea
what what you saw which is neat it's a
clickable video kind of like just hover
over something and buy it right and
that's really straightforward yes
absolutely yeah and so that's like one
of those things that are really it was
really straight forward it really liked
it but what I found is that the engine
that drives it we have an API that we're
launching here at South by Southwest and
that API is much more it's much more
versatile because you can use that API
to do things like them you just take
this data and now you can target
advertising based on what's in the video
you can connect videos together now so
if i know i'm watching this one video
and it has you know a bunch of iphones
in it i know that you know i can connect
these other videos together so it allows
you to be able to discover things in
video more and so you can kind of go for
one video to the next and actually have
it connected through a line of content
which is I think we're pretty powerful
so you've actually potentially cracked
the nut on video recommendation on
related video I mean did you just win
the netflix contest yeah yeah just one
not here actually we did relations we
did I mean it's one of those things that
like I think I became sort of i was very
interested in this clickable video
concept but i think that at least the
early stages i think of be known we're
going to do more business around the
uses of the api for that things like
that connecting videos together doing ad
targeting helping people find videos
right so with that same data i can now
find things more easily if you've ever
looked for something on youtube that
doesn't have a hundred thousand views or
a million views there are no chance for
so you have no way of finding unless you
know what the nate the title of it is
yeah now I've got to imagine also right
product product placement amazing and TV
shows and a lot of these TV networks are
trying to find some sort of product or
something that can bridge the gap
because we now know a lot of people are
interacting on
computers at the same time as they're
watching TV shows but they're kind of
doing two things at once so I've got
imagine or you have you talked to some
TV networks or how to use this API so
that it you know we've seen programs
like Shazam or others that can hear
what's playing on the TV and then to say
okay you're watching the show so my
imagination is running a little while is
it a thing where these networks are
approaching like okay how do we involve
not only do we can we hear and know
where you are in the show but ads or
recommendations are being served to my
screen now that it knows exactly what's
on that screen while being able to
listen to it yeah you just use
identified like another use of the API
which is which is sort of like if you're
not going to make the stuff clickable on
the screen maybe you can pipe that into
like a second screen app right where you
can actually sort of like if you're
watching them you're watching this the
show you pull up your iPad app right and
now we know how to sync it with a couple
of the programming so now we know what
shall we are watching and we know what
time it's at so we now can sort of say
as you're watching the show here's a
short you can buy here's this thing you
can buy and the network or the video
producer can control that experience ray
because it's their app and so like
that's another big piece of date or the
reason why the API is just strong
business case is just that the big
publishers who have premium fountain
they can control the experience right so
they might not want me to put my little
hover jog over there you know ten
million dollar episode and so that's one
of those things that I think is pretty
suitable for them so so really what
you're saying is there's going to be a
fitting work for your company between
CBS and Google yes cuz I could I mean
when you describe that algorithm for
recommending videos for daisy chaining
related videos for discovery like that's
got YouTube written all over it yeah you
know it does and I think I guess our art
my theory in this is that data like the
data of videos opaque right now a video
sits on a site and no one knows what's
in it so someone is going to solve that
problem right you can't do anything
interesting really really truly
interesting with experiences around
video until you have the data behind it
right to be able to connect things and
find out where things are and so
someone's are going to do that yeah
exactly search and so so someone's going
to do it and we think that we're going
to be the first
people to do it and do it well so I have
to say it's super impressive and it's
kind of amazing that you honestly I feel
like you were the first credible source
that I've talked to is doing this and
that's five or six years probably of
people publicly trying to figure out
video search to index like what kind of
geniuses do you have with in there so
well I'm one of them you know and it was
like well hello so you know we might
hear my revised I think a lot of it had
to do with looking at the problem
differently I think so there's this
temptation this is a little bit
technical but I think it's pretty
straightforward so there's this
temptation when you're looking at the
way the idea came about was I was
building iphone apps and Android apps
and i was using image recognition to do
like alternate entry so to take a
picture of a credit card and pick the
data gets into the thing you have to
enter it with your finger which no one
really loves to 16 digits expiration
date cvv code and address is not very
fun so if you can take a picture of the
card that saved some time and what I
found was that the like experience was
really you have a flash on the camera
you don't get the right data and sort of
to see like okay you know I just need
more chances to get this picture just
right and so that from that became okay
well I need more frames that's video now
the challenge is that regime it with
image recognition it's all about like
getting what you can from this one frame
and saying I want to glean these
patterns from this one frame this one
image and figure out what the products
aren't that but what we do is we say you
know what let's let's step back and look
at all those frames as a whole like look
at them linearly and say what can we
know about all these look at me know
about all these rings and then how can
we connect them like so that if I'm
looking at you know ten frames maybe I
can find out a little bit more about
what's in those things because i have so
many different angles and things like
that so it's it's sort of more about
like looking at the problem a little
differently but i think it is sort of
raw brain power because the reality is
you could stare at a frame all day and
never know that it's an apple ipad or
never know that it's a you know Mercedes
Benz car but 30 frames later you might
see
so if you can connect that stuff that's
that's sort of the inside i think that
is helping us get there and we have the
processing power now that like the
computer horsepowers there's a process
that much yeah interesting so you're not
trying to attach all new data to videos
which i think is what video search is
tried to do in the past you're using the
data in the video and then matching it
up against yeah i mean it's it's hard
with the volume with video right you
know billions of videos I think YouTube
does five billion views a day I mean
that's you can't have people going in
there antagonist survive August unless
you really was more videos online there
are people in the world that's a pretty
staggering amount and there and they're
just kind of keep they keep coming right
in to keep and cell phones have every
stuff on this camera take HD video now
so it's just sort of like the volume can
larger larger so now the 65 million
dollar question is how accurate are you
so we asked that a lot and it's sort of
a hard question because I always say
like we're really accurate right next to
practice yeah hands on the video but I
guess so we have an internal QA system
so every time we come out with a new
algorithm of how we are connecting the
data together and how we're kind of
processing the frames we do blind QA and
we basically we have everyone in our
company is sort of the startup
atmosphere we everyone does QA you look
at a frame and a and you say i'm
going to give this a 125 on a sliding
scale of specificity and accuracy right
so like i'm looking at a mercedes benz
you know popsicle stick is a zero right
but mercedes-benz is a five and then
like car is a three right so there's a
sliding scale and in that way we can
kind of see if we're improving this with
each little tweak and turn of the screw
and we think right now we're at like a
four or sort of like a bee of where we
could be and I think there's a lot of
room that extra miles is a big deal and
that's part of the reason why we provide
the manual option is that if someone
wants to go do just a quick find replace
and so you know what these are actually
all Pepsi's just just let me just do
that it just right you know since we're
using natural language to process the
tags already they can just go in and say
you know make all the cans pepsi cans
and
you know set it and forget it so you
obviously have this amazing intriguing
product and an algorithm what what's for
you guys kind of the end goal um I think
it's just sort of we just real i just
really want to solve this problem i
think it's one of those things that you
know people again like you said it's
been there's a lot of companies that
have come and gone around this idea of a
clickable video and video search and
video discovery and i think like there's
a lot of stuff that it can still be done
in video one sweet once we solve this
problem and i just really want to be
part of that solution also the bidding
war between six yeah yeah yeah you know
yeah me too yeah we don't judge you for
that awesome Ike having super cool
technology thank you for talking to us
thank you you can find all of our South
by Southwest interviews and of course
like a lot more interesting video then
we'll hopefully soon be scanned and
indexed at Cena TV com
We are a participant in the Amazon Services LLC Associates Program, an affiliate advertising program designed to provide a means for us to earn fees by linking to Amazon.com and affiliated sites.