Technology Exercise 6: Learning About the World
Internet Delays (Question 1)
There is a command called traceroute that will show the route that information
travels to get from one host on the Internet to another. It also shows the
round trip time (in milliseconds) that it takes to get to each point along
the route.
We're going to collect information about how quickly sites on the Internet
in the United States can reach Richland's web server. Of more use to students
might be the reverse condition, how quickly can we reach sites on the Internet.
So, even though we're collecting the time it takes from the remote server to
Richland, we'll use it as the time from Richland to them.
We will collect information from 12 different sites on the Internet. We will
do this 5 times at different times. It's possible that the connections may
be really good during one time of the day but slow at another time. That's
why we're repeating the process and collecting at 5 different times. Those
different times could be at different times of the day or on different days.
I just ask that there be at least three hours between samplings.
You could collect information from home if you have an Internet connection
and then bring it into school in an Excel file. Since the traceroute is from
the remote website to Richland and not to your home machine, it won't be affected
by dial-up or broadband connections.
Creating the Worksheet
- Label column 1 as "sample"
- Label column 2 as "site"
- Label column 3 as "time"
- Choose Calc / Make Patterned Data / Simple Set of Numbers
- Store the pattern in sample
- Start with the first value of 1
- End with the last value of 5
- List each value 36 times
- Click OK
- Choose Calc / Make Patterned Data / Simple Set of Numbers
- Store the pattern in site
- Start with the first value of 1
- End with the last value of 12
- List each value 3 times
- Repeat the entire list 5 times
- Click OK
- Save your file in R:\01\tech6 or R:\02\tech6 depending on which section
you're in. Use a filename that's unique to your group.
When you gather the information, there will be three times for each location.
These should go in separate rows.
Gathering the Data
Click on each of the links that follow below. If the traceroute is successful,
the last line should indicate www.richland.edu (64.107.104.12) and then have
three times in milliseconds (ms or msec). Some sites will include an AS number,
you can ignore that.
Sample Traceroute Output
traceroute to 64.107.104.12 (64.107.104.12), 30 hops max, 40 byte packets
1 inside.fw1.sjc2.mfnx.net (208.184.213.129) 0.293 ms 0.428 ms 0.242 ms
2 99.ge-5-1-1.er10a.sjc2.us.above.net (64.124.216.10) 0.573 ms 0.555 ms 0.596 ms
3 so-2-0-0.mpr3.sjc2.us.above.net (64.125.30.89) 0.487 ms 6.114 ms 0.596 ms
4 so-4-1-0.cr1.ord2.us.above.net (64.125.30.206) 52.288 ms 52.626 ms 52.059 ms
5 pos4-0.mpr1.ord1.us.above.net (208.185.0.198) 51.978 ms 51.946 ms 52.092 ms
6 atm-0-0-0.nap.lincon.net (206.220.243.106) 54.319 ms 56.580 ms 52.386 ms
7 atm2-0-sub02-soib1-peoria-core.peoria.lincon.net (206.166.9.170) 60.125 ms 68.693 ms 74.794 ms
8 atm1-0-sub11-peoria-core-fat-elvis.springfield.lincon.net (206.166.9.186) 60.386 ms 59.775 ms 64.263 ms
9 atm-3-1-0-11-fat-elvis-liberace.springfield.lincon.net (206.166.9.234) 178.928 ms 70.812 ms 132.370 ms
10 10.9.31.2 (10.9.31.2) 66.950 ms 66.038 ms 61.614 ms
11 www.richland.edu (64.107.104.12) 74.816 ms 67.278 ms 61.843 ms
Enter the site number (these are already entered if you followed the steps
above) and the three times into Minitab. Each site will take up three rows
of data. Do not enter the units on the times. The data above would look like
this.
row |
sample |
site |
time |
1 |
1 |
1 |
74.816 |
2 |
1 |
1 |
67.278 |
3 |
1 |
1 |
61.843 |
Click on each link below to start the traceroute. Be patient, some of the
traceroutes can take a while. Also, there will be a noticeable delay on most
of the sites right before you reach Richland. In many cases, you will see a
"* * *". This is okay, just be patient. After you have entered the information
into Minitab, hit the back button on your
browser
to
come back
to here and
visit
the next
site.
Collect all 12 sites' information at the same sitting so that we are comparing
similar conditions. Wait at least three hours before repeating the process.
If you want to do this at home, create an Excel spreadsheet with the two columns,
then bring it into school and copy and paste the information into Minitab.
- above.net
- io.com
- his.com
- sdsc.edu
- calweb.com
- getnet.com
- princeton.edu
- opus1.com
- socket.net
- wvi.com (rounds
to nearest ms)
- xmission.com (generates
entire page before displaying, be patient; rounds to nearest ms)
- playground.net (generates
entire page before displaying, be patient)
Conducting the Hypothesis Test
Check the assumptions. If they aren't satisified, address any concerns that
may have in interpreting the data.
To make a probability plot, do the following. You can also use the graphical
summary under Descriptive Statistics to get a histogram.
- Choose Stat / Basic Statistics / Normality Test
- Select the time variable
- Click OK
To conduct the hypothesis test
- Choose Stat / Basic Statistics / 1 Sample t
- Select the variable time
- Enter the claimed value for the mean in the test mean box.
- Go into options and make sure the values are set properly.
- Click OK
Email Spam (Question 2)
I have collected some information about spam email and placed it into a file.
There is too much information (about 70% of the email we get at Richland gets
labeled as spam), so we will need to take a random sample to work with. The
data is in the worksheet spam.mtw.
- Choose File / Open Worksheet
- Navigate through the system to the tech6 folder for your section.
- Highlight the spam.mtw file
- Click Open
Taking a Random Sample
- The worksheet has one column in it called "spam". Label another column
as "score".
- Choose Calc / Random Data / Sample from Columns
- Sample 35 rows
- Sample from the spam column
- Store the data in the score column
- Do not sample with replacement
- Click OK
Checking Assumptions
Check your assumptions by getting a graphical summary of the data and a probability
plot.
Testing the claim
- Choose Stat / Basic Statistics / 1 Sample t
- Select the score variable
- Enter the claimed value for the test mean
- Go into options and make sure they're set properly for your test.
- Click OK
Designated Hitter (Question 3)
Entering the Data
- Choose File / New / Minitab Worksheet to create a new worksheet.
- Label
three columns as Team, League, and OBP.
- Go to the Major League Baseball site at http://mlb.mlb.com/NASApp/mlb/mlb/stats/mlb_sortable_team_stats.jsp
and enter these options on the left side in the "Sortable Team Stats" section.
- Choose Major League
- Choose Hitting Stats
- Split by the Entire Season
- The Timeframe is 2003 to Date
- Click GO. Note that sometimes this site is slow. If it doesn't respond after
a few seconds, you may want to click on the stop button and re-click
GO.
- Click on Team at the top of the data.
- For each team, enter the name of the team, the League (AL or NL), and the On
Base Percentage (OBP)
Performing the Hypothesis Test
- Choose Stat / Basic Statistics / 2 Sample t
- The samples are in one column
- The samples are in the OBP column
- The subscripts are in the League column
- Check the Options and make sure they're set properly
- Click OK
Better Pitcher at Home? (Question 4)
Entering the Data
Just use the same worksheet you started for question 3.
- Label two more columns, one as Home and one as Away.
- Go to the Major League Baseball site at http://mlb.mlb.com/NASApp/mlb/mlb/stats/mlb_sortable_team_stats.jsp and enter these options on the left side in the "Sortable Team Stats" section.
- Choose Major League
- Choose Pitching Stats
- Split by Home
- The Timeframe is 2003 to Date
- Click GO. Note that sometimes this site is slow. If it doesn't
respond after a few seconds, you may want to click on the stop button
and re-click GO.
- Click on Team at the top of the data.
- Enter the Earned Run Average (ERA) values into the Home column for the proper
teams. If you clicked on Team like step f says, then the order of the teams
should
be the
same
as what you previously entered. If not, be careful to match up the ERA values
with the proper team.
- Repeat step 2, except this time, Split by Away.
- Enter the ERA values for each team in the Away column.
Performing the Hypothesis Test
- Choose Stat / Basic Statistics / Paired t
- The first sample is in Home
- The second sample is in Away
- Go into options and make sure they are set properly. Note that Minitab compares
sample 1 to sample 2, so make sure your alternative is set up properly with
Home on the left and Away on the right.
- Click OK
Show the T Approaches the Normal (Question 5)
Start this in a new worksheet.
- Label the first column as x, the second as z, the third as t1, and
the fourth as t5.
- Go to Calc / Make Patterned Data / Simple Set of Numbers. Start at -3
and
go to 3 with a step size of 0.01. Store the results into x
- Go to Calc / Probability Distributions / Normal. Select Probability Density,
set the input column to x and the optional storage to z.
- Go to Calc / Probability Distributions / t. Select Probability Density,
set the degrees of freedom to 1, the input column to x and the optional
storage
to t1
- Go to Calc / Probability Distributions / t. Select Probability Density, set the
degrees of freedom to 5, the input column to x and the optional storage
to t5
- Go to Graph / Plot
- Create three plots. For the y-variable, use z, t1, and
t5. Use x for each of the x variables.
- Change the data display from symbol to connect.
- Go into Edit attributes and change the line type to solid for all
three graphs. Choose three different colors for the graphs. Change the
line thickness
to 2. Make a note of which graph is which color, you may want to use that
information in step g to correctly label the graphs.
- Click on Frame and choose Multiple Graphs. Overlay the graphs on
the same page
- Click on Annotation and choose Title. Enter all of the names of the
people in your group on the first line and change the font size for
that line to
1.0.
- Click OK to generate the graph
- Double click on the graph to bring up the toolbar. Click on the big
T in the upper right. Click on the graph next to the normal curve and
label
it "z".
Label the student’s t curves with "t=1" and "t=5". You may need to reposition the text after entering it so that it is viewable. You can do that by clicking on the object, holding down the mouse button, and
dragging it to a new location.
- Remove the label on the vertical axis (you have three graphs, but
just one label) by clicking on it and hitting delete.
- Copy the graph into Word and add an explanation of what we're looking at.