-
Revision as of 02:16, 17 March 2010 by Grahamenglish (Talk | contribs)
Contents
Requirements
Running the Script
This script requires modifying a few parameters:
- A data file (a simple .txt file) with the URLs of the sites to be tracked. Put each URL on a single line and you'll need to use a fully qualified URL that begins with http:// for it to work properly. For search engine URLs, It's simply a matter of copying the URL after you perform your keyword search.
- The email address of the person or comma-separated group of people who should receive the updated web page
- The paths to your site archive (where the pages are compared to each other) and URL data file
- The from email address (Very useful if you want to attach a mail rule).
To run the script, simply type "changetrack" into Terminal. Note: You'll have to add the path to the script to your Terminal profile:
open -e .profile
Add this on a new line anywhere in your profile:
export PATH="$PATH:/Users/username/scripts/"
Obviously, you'll want to change the path to wherever you save the script. Make sure you save as plain text. Alternately, you can type the full path into Terminal to run the command. Something like:
/Users/username/scripts/changetrack
The Script
Open text edit, copy, paste, and save as plain text with the name
changetrack
somwhere on your computer.#!/bin/sh # changetrack - Tracks a given URL and, if it's changed since the last visit, emails the new page to the specified address. sitearchive="$HOME/scripts/tmp/changetrack" # change as desired sendmail="/usr/sbin/sendmail" # might need to be tweaked! fromaddr="[email protected]" # change as desired mydatafile="$HOME/scripts/changetrackurls.txt" myemail="[email protected]" if [ ! -d $sitearchive ] ; then if ! /bin/mkdir $sitearchive ; then /bin/echo "$(basename $0) failed: couldn't create $sitearchive." >&2 exit 1 fi /bin/chmod 777 $sitearchive # you might change this for privacy fi for datafield in $(/bin/cat $mydatafile) do MYURL="$(/bin/echo $datafield)" #myemail="$(/bin/echo $datafield | /usr/bin/cut -d: -f2)" fname="$(/bin/echo $MYURL | /usr/bin/sed 's/http:\/\///g' | tr '/?&' '...')" baseurl="$(/bin/echo $MYURL | /usr/bin/cut -d/ -f1-3)/" # Grab a copy of the web page into an archive file. Note that we can track changes by looking # just at the content (e.g., '-dump', not '-source'), so we can skip any HTML parsing ... /opt/local/bin/lynx -dump "$MYURL" | /usr/bin/uniq > $sitearchive/${fname}.new if [ -f $sitearchive/$fname ] ; then # We've seen this site before, so compare the two with '/usr/bin/diff' if /usr/bin/diff $sitearchive/$fname $sitearchive/${fname}.new > /dev/null ; then /bin/echo "Site $MYURL has changed since our last check." else /bin/rm -f $sitearchive/${fname}.new # nothing new... exit 0 # no change, we're outta here fi else /bin/echo "Note: we've never seen this site before." fi # For the script to get here, the site must have changed, and we need to send the contents of # the .new file to the user and replace the original with the .new for the next invocation of the script. ( /bin/echo "Content-type: text/html" /bin/echo "From: $fromaddr (Web Site Change Tracker)" /bin/echo "Subject: Web Site $MYURL Has Changed" /bin/echo "To: $myemail" /bin/echo "" /opt/local/bin/lynx -source $MYURL | \ /usr/bin/sed -e "s|[sS][rR][cC]=\"|SRC=\"$baseurl|g" \ -e "s|[hH][rR][eE][fF]=\"|HREF=\"$baseurl|g" \ -e "s|$baseurl\/http:|http:|g" ) | $sendmail -t # Update the saved snapshot of the website /bin/mv $sitearchive/${fname}.new $sitearchive/$fname /bin/chmod 777 $sitearchive/$fname done # and we're done. exit 0
Configuring Postfix to Send Email
If you want to use Postfix on a standalone server, you must configure two settings in /etc/postfix/main.cf. The first is the hostname (myhostname). This should be a real hostname, something that can be found in a reverse DNS lookup against your IP address. The second is your origin (myorigin), which is the domain name from which email appears to originate. This can be the same as your hostname (this is probably the case for small sites). However, if it is not, be sure to specify the correct hostname. For example, here are the settings for a computer named ip192-168-0-1.ri.ri.cox.net with all email originating from that machine appearing to come from [email protected]:
myhostname = ip192-168-0-1.ri.ri.cox.net myorigin = cox.net
If your ISP's network is configured to block outgoing SMTP to all but their SMTP server, using your ISP's SMTP server as a relay host may be the only way you can configure postfix to deliver mail.
If you don't have a permanent domain name for your Mac OS X server, we suggest configuring Postfix to use a relay host (most likely your ISP's SMTP server). To configure Postfix to use a relay, add an entry for relayhost in /etc/postfix/main.cf. For example, we use the following setting:
relayhost = smtp.rcn.com
Along the same lines, you should configure Postfix to masquerade as the appropriate host using the myorigin setting in /etc/postfix/main.cf. In the case of the previous example, the origin is oreilly.com (as in [email protected]):
myorigin = rcn.com
If this seems confusing, don't worry. All I did was type the following into Terminal:
open -e /etc/postfix/main.cf
Then I used find to modify two lines:
#myorigin
to
myorigin = rcn.com
rcn.com is my ISP. Use your own.
#relayhost
to
relayhost = rcn.com
Again, use your own ISP.
Automating and Hacking
I tried, but couldn't get this script to run from cron. But I was successful running it as an iCal alarm. Create an AppleScript like the following:
do shell script "/Users/username/scripts/changetrack"
Modify the path to where you saved the
changetrack
script. Create a new iCal event at a time you expect your computer to be running. ChooseRun script
for the alarm and then navigate to where you saved the Applescript. Repeat the event as often as you want the script to run.If you use Mail, create a smart mailbox so you can quickly compare current and previous search results.
If you use MailTags, you can create To Dos in iCal if you need to take any actions based on your
changetrack
results. -
Meta