I was playing with Azure Cognitive Services and figured that I would switch to my @epam.com account for the next prototype. I needed to re-sign-up. This is my story.
Login
Microsoft’s live.com OAuth can integrate with your ADFS for a single sign-on experience:
I opted in for my work account, authenticated, and went ahead to enable the services I needed.
Verification
To enable Cognitive Service APIs on this account I needed to activate my Azure subscription. Microsoft will challenge your identity twice.
First, it’s the code verification that you can do by sending yourself a text. My form was pre-populated with the Belarus country code (+375). I quickly dismissed it as probably an old attribute on my AD profile. You see, I lived in Minsk (Belarus) before I moved to the states a good while ago but not every enterprise system got the memo. No big deal. Typed in my cell phone and got the activation code.
Then it’s the identify verification by card:
My postal code is again pre-populated with the one from the past. This time, however, I wasn’t able to use my current address:
What do you do when an online form tries to outsmart you? I, personally, try to outsmart it back:
Guess what, Microsoft engineers are very diligent. The validation also runs server-side and the form comes back with an error:
Sigh… My US street address and the city of Marietta were gladly accepted. It’s the postal code format and length validation that failed. Why so serious? A better solution would probably be to ask for a country as part of the address form and validate against it. Or maybe trust the SSO challenge that I went through when logging in and just collect my card?
Anyway. I guess I will keep using my personal account with Azure for now and will wait for out IT to find the field in my profile that ties me to my home country.
I have used Cognitive Services from Microsoft (part 1) and IBM Watson Services (part 2) to read my avatar image. There are two more APIs that I would like to put to the test - Google Cloud Vision API and Clarifai.
Google Cloud Vision
I already had a developer account. To use the Cloud Vision API I only had to enable it in the console and generate myself a browser key. When you sign up, Google asks for your credit card but they promise not to charge it without your permission. They also give you $300 in free trial credit and 60 days to use it.
The API itself is clearly designed for extensibility.
It’s a single endpoint that can do different things based on your request. An image can either be sent as a binary data or as a URL to a Google Storage Bucket. You can send multiple images at once and every image request can ask for different type of analysis. You can also ask for more than one type of analysis for a given image.
Google can easily add new features without adding new APIs or changing the endpoint’s semantics. Take a look:
A man who definitely cares about his hair, right? :) I am not sure where the sports and athlete bits came from. I also wonder if I would get more tags (like a microphone, for example) if I could ask for features with lower scores. The API doesn’t seem to allow me to lower the threshold. I asked for ten results but got only six back.
The face detection sent down a very elaborate data structure with coordinates of all the little facial features. Things like left eye, right eye, eyebrows, nose tip, and a whole lot more. The only thing is … you can’t see the left side of my face on my avatar.
Google also tries to detect emotions. Of all that it can see - anger, joy, sorrow, surprise - none came back with anything but VERY_UNLIKELY. You can also test an image for explicit content. Same VERY_UNLIKELY for my avatar.
Very pleasant experience but I honestly expected a little more from Google’s Vision API.
I expected more because I know Google does all kinds of crazy things with deep learning in their labs. With images as of two years go and very recently with video. Maybe as those models mature, the Cloud Vision will support more features? Time will tell.
Clarifai
The easiest setup experience by far!
I was ready to go in just a few seconds, no kidding! And it also felt like the fastest response from all the APIs I tried. Very easy and intuitive to use as well:
This is actually very close! Good feature detection with various plausible scenarios spelled out based on that. I would only question the absolute confidence in music and singer :) What about a… conference and a speaker?
Clarifai has another very interesting endpoint - Feedback. I haven’t used it but it seems that you can submit your own labels back to Clarifai and help them train and fine-tune the model. It won’t be your own classifier like Watson does. Feedback seems to be a crowdsourcing mechanism to train their main shared model(s). I only wonder how it will work without you having to specify the area of the image that each new label is attached to. In case of my avatar, conference and speaker would attach to the whole image. What about more involved images? Maybe I am missing something…
There’s a lot more computer vision APIs out there. Some are more generic and some a geared towards more specialized tasks like visual product search or logo recognition. Go give it a try!
It’s fascinating what kinds of things are just one HTTP request away.