Student:I was doing an Internet search for information about soccer. I put the word
soccer in a keyword search, and got about 160,000 results. That's too many!
Mentor: What do you want to know about soccer?
Student: I want to find some information about college soccer teams. Oh, let me try
soccer team as a phrase. I will use quotation marks to find only those Web pages where the two words
appear next to each other. There are about 30,000 results for that!
Mentor: Let us try again. This time, we will try to get the pages that contain both the word
college and the phrase
soccer team . Do you know how to do it?
Student: Yes, it's not hard. I just have to put plus signs before each word or phrase. Oh, this is
better. About 10,000 results.
Mentor: This kind of refined search can help you a lot! How many results do you predict for the
search for "college"?
Student: A lot! A million, maybe? Let me try. Wow! 1,563,571 links. I guess the numbers can be
different with different search engines, though.
Mentor: There is an interesting mathematical model for the searches you just did. Let us draw a
picture with two ovals, one for
college search results, another for
soccer team results. If you'll excuse me, I won't draw all million and a half links for
college and won't even try to keep the proportions between the two
sets of links:
Student: Oh, I see. When we do a search using plus signs, we only get documents that contain both
college and
soccer team.
Mentor: In mathematics, this operation is called the
intersection of sets. Do you see why?
Student: I can see it on the picture! It is harder to express it in words, though.
Mentor: That's why mathematicians are so fond of pictures. By the way, there is a special picture, or
sign, for this operation. The signs in scientific language are often used to write (and read)
faster. Let us use
C for
college, and
ST for
soccer team. Then the documents you found on your last search would contain
C and ST, or using the special sign,
C
ST
Student: So
stands for
and. Easy enough. But let us return to our search. 10,000 links is still too much!
Mentor: Let us force the search to be even more specific. Are you interested in college soccer teams
from around the world, or not?
Student: I only want to check on the teams from the USA. So, I am going to refine the search even
more. This time, I am looking for the documents that contain
all of the words:
college,
soccer team, and
USA. I am using plus signs again:
Mentor: Can you draw a picture for your search, as we did before? Such pictures are called
Venn diagrams.
Student: Sure. I will only use the first letters for the links. This time, I will have three...
Student: Right, sets. One set for each word or phrase I used. By the way, I got about 1300 documents
this time, because the search engine only selected those of the 10,000 documents with
college and
soccer team in them that also contained
USA. There are a lot of documents that have the word
USA, and a lot that have the word
college, and a lot that have the phrase
soccer team, but a much smaller number of documents contain all three!
Mentor: So here we have the
intersection of three sets: the set of documents that have the phrase
soccer team, the set of documents that contain the word
USA, and the set of documents that contain the word
college. We can write it using the symbol for
intersection:
C
ST
USA
Mentor: By the way, can you highlight on the diagram what happens if you search for
college,
soccer team and
USA without using plus signs?
Student: I will see the documents that have at least one of these words. There should be a lot of
documents that do! Here is the picture for that:
Mentor: This operation is called the
union of sets. There is a special sign for that, of course:
C U ST U USA
means that we are talking about all the documents that contain the word
college or the word
USA or the phrase
soccer team.
Mentor: The last search option I would like to discuss is using the minus sign. Suppose you want to
search for documents that contain
soccer team but not
college...
Student: If I want that, I will use the plus sign in front of soccer team and minus sign in front of
college:
Mentor: Can you draw a Venn diagram for that, highlighting the parts we will find?
Student: Sure:
Student: Now tell me, what is the special sign mathematicians use for this one:
Mentor: It reads: "The difference between sets." Here:
ST \ C
It reads: "The difference between the set of documents that have
soccer team and the set of documents that have
college."