WEBVTT

1
00:00:00.000 --> 00:00:04.800
Hi, and welcome to this ANC SHARP video on the Microsoft Agent Framework.

2
00:00:04.800 --> 00:00:09.600
Today we're going to look into a unique feature that Google Gemini have,

3
00:00:09.600 --> 00:00:13.100
and that is integration into Google Maps.

4
00:00:13.100 --> 00:00:17.000
I haven't seen any of the others have any maps integration,

5
00:00:17.000 --> 00:00:20.600
but since Google owns Google Maps, of course,

6
00:00:20.600 --> 00:00:23.600
you have the option to actually integrate into it.

7
00:00:23.600 --> 00:00:25.600
So let's see that in action.

8
00:00:26.600 --> 00:00:31.600
So here I am in a sample,

9
00:00:31.600 --> 00:00:37.599
and that sample being the Google Gemini specific features and Google Maps.

10
00:00:37.599 --> 00:00:44.599
And we make a Google Gen AI client as normal with Google,

11
00:00:44.599 --> 00:00:47.599
and turn it into an agent like we do.

12
00:00:47.599 --> 00:00:54.599
Normally we set tools using the normal tools and AI tools,

13
00:00:54.799 --> 00:00:57.400
and AI function factory, and so on.

14
00:00:57.400 --> 00:01:00.200
But these tools are very special to Google,

15
00:01:00.200 --> 00:01:04.000
so we need to go in and do what is called breaking glass.

16
00:01:04.000 --> 00:01:07.900
So inside the chat client agent options and chat options,

17
00:01:07.900 --> 00:01:11.400
we need to set the raw representation factory.

18
00:01:11.400 --> 00:01:14.199
And if we do that to generate content config,

19
00:01:14.199 --> 00:01:20.400
we get access to some other tools, not AI tools, but the Google tools.

20
00:01:20.400 --> 00:01:26.400
And then among those tools, we can make a tool called Google Maps.

21
00:01:27.199 --> 00:01:30.599
We have very few options here,

22
00:01:30.599 --> 00:01:34.400
but among one thing, we can enable a widget.

23
00:01:34.400 --> 00:01:39.800
We'll talk a little more about what those are at some point.

24
00:01:39.800 --> 00:01:46.599
But if you do this, and let's just run it to here,

25
00:01:46.599 --> 00:01:49.199
we can then ask a question, for example, like,

26
00:01:49.199 --> 00:01:53.800
what is the opening times of Hard Rock Cafe in New York?

27
00:01:53.800 --> 00:01:58.400
And tell me if it's wheelchair access.

28
00:01:58.400 --> 00:02:03.199
So if we do this, it will go off and talk with Google Maps.

29
00:02:06.000 --> 00:02:09.000
And once it comes back,

30
00:02:11.000 --> 00:02:13.199
we can see we get a response back.

31
00:02:13.199 --> 00:02:16.399
Let me make this a little bit bigger so we can see.

32
00:02:16.600 --> 00:02:20.600
So it talks about Hard Rock Cafe at Times Square,

33
00:02:20.600 --> 00:02:26.600
where it's located, the opening hours, and how the wheelchair access is.

34
00:02:28.000 --> 00:02:32.600
Apparently, there's two of them. I didn't actually know, but fair enough.

35
00:02:33.800 --> 00:02:40.000
So what we can do in order to actually look a bit more into this,

36
00:02:40.000 --> 00:02:45.000
instead of just what is being shown here,

37
00:02:45.199 --> 00:02:48.800
which has been taken out of the response,

38
00:02:48.800 --> 00:02:51.800
is we can go in and take our response,

39
00:02:51.800 --> 00:02:55.199
get the raw representation factor of that,

40
00:02:55.199 --> 00:02:58.199
which is a chat response, still Microsoft.

41
00:02:58.199 --> 00:03:01.000
But if we go in and take the raw representation of that,

42
00:03:01.000 --> 00:03:03.800
we get back to the generate content response,

43
00:03:03.800 --> 00:03:07.600
so the output of this.

44
00:03:07.600 --> 00:03:11.800
And once we have that, we can go to something called Candidates.

45
00:03:11.800 --> 00:03:15.199
And inside the Candidates, there's only one in this case,

46
00:03:15.199 --> 00:03:18.399
we can get grounding metadata.

47
00:03:18.399 --> 00:03:20.600
So if we get that metadata,

48
00:03:20.600 --> 00:03:25.800
we can get what is called the Google Maps Widget Context Token.

49
00:03:26.800 --> 00:03:29.000
I haven't tried this out in real life,

50
00:03:29.000 --> 00:03:32.399
but according to the documentation,

51
00:03:32.399 --> 00:03:36.399
you can use this token together with their JavaScript API

52
00:03:36.399 --> 00:03:39.199
to get something that looks a little like this

53
00:03:39.199 --> 00:03:42.600
in your AI application, if you wish to.

54
00:03:44.000 --> 00:03:49.000
Alternative, you can go in and get all the grounding chunks,

55
00:03:49.000 --> 00:03:53.399
meaning that in this case, we have two grounding chunks,

56
00:03:53.399 --> 00:03:57.000
one being for each of the two Hard Rock Cafes.

57
00:03:57.800 --> 00:04:00.600
And if we go through each of them,

58
00:04:00.600 --> 00:04:08.399
we get a URL to Google Maps, so we could go to this URL. Let's do that.

59
00:04:11.399 --> 00:04:14.800
And we will see that we get to Hard Rock Cafe in Google Maps.

60
00:04:14.800 --> 00:04:18.399
So we can get links to the thing,

61
00:04:18.399 --> 00:04:22.399
and we can get a title, which is just the Hard Rock Cafe.

62
00:04:24.200 --> 00:04:27.799
And we can get a bunch of extra information

63
00:04:27.799 --> 00:04:32.600
about that specific location.

64
00:04:32.600 --> 00:04:39.000
And we are talking really lots, type, restaurant, business, price level.

65
00:04:39.000 --> 00:04:44.399
All the things you find in Google normally is also here,

66
00:04:44.399 --> 00:04:46.200
including the wheelchair access,

67
00:04:46.200 --> 00:04:51.200
and that's the reason why they were able to give us back that information.

68
00:04:52.200 --> 00:04:56.799
And then there would be one more, and one more, and so on.

69
00:04:56.799 --> 00:05:02.399
And for each of these, you can see these small one and two,

70
00:05:02.399 --> 00:05:07.000
that is just referencing each of these grounding chunks,

71
00:05:07.000 --> 00:05:13.000
so you can link the data to the grounding chunk.

72
00:05:14.600 --> 00:05:18.799
But if we do it like this, it will give us back all this extra information

73
00:05:18.799 --> 00:05:23.399
if we wish to also show that to the user in some way.

74
00:05:23.399 --> 00:05:26.000
It only comes back in this format,

75
00:05:26.000 --> 00:05:29.200
it's not in a structured format, unfortunately,

76
00:05:29.200 --> 00:05:35.200
but it's still better than nothing that we have all this information.

77
00:05:36.399 --> 00:05:38.600
And once we have gone through that,

78
00:05:38.600 --> 00:05:43.399
getting the second Hard Rock Cafe here,

79
00:05:43.399 --> 00:05:46.200
and all its information.

80
00:05:46.200 --> 00:05:49.399
So if there's only one result, you only get one of these,

81
00:05:49.399 --> 00:05:53.399
if there's multiple results, you get multiple of these.

82
00:05:54.600 --> 00:05:57.200
A little about pricing of this,

83
00:05:57.200 --> 00:05:59.200
it's actually the same as the web search,

84
00:05:59.200 --> 00:06:04.799
you get 5,000 prompts for free, and then $14 per 1,000 queries.

85
00:06:04.799 --> 00:06:08.000
So it's a really cool way if you have something

86
00:06:08.000 --> 00:06:10.600
that needs to integrate with restaurants,

87
00:06:10.600 --> 00:06:16.600
with sightseeing and such things in the system.

88
00:06:17.000 --> 00:06:21.000
So that's everything, see you on the next one.

