[d@DCC] Comment-scanning script
mskala at ansuz.sooke.bc.ca
mskala at ansuz.sooke.bc.ca
Tue Apr 20 22:49:08 EDT 2004
It was suggested that maybe I should post my script for scanning the
copyright reform process submissions and recognizing which ones were
similar to the EFF form letter. I'm not sure how useful it really is at
this point, since it was intended to be run just once, back in 2002, and
the output (which is already public) is really the interesting part, but
anyway, here it is (below my signature).
I hereby release this to the public domain.
Excuse the lack of documentation and any programming-style gaffes... this
really was meant to be used just once, by myself, and then discarded, and
it's only my pack-rat instincts that caused me to save a copy. It's
intended to be run in a directory which already contains subdirectories
SSG and SSGF, mirroring those on the Strategis Web site. Note that that
is the Strategis Web site as it *was* at the time I wrote the script,
which based on the file date seems to be Summer 2002! The Strategis site
has since changed its design and the script would have to be re-written to
work with the new design.
It scans all the files, scores them for similarity to the form letter and
approximate length, and spits out tab-separated lines containing the
similarity score, the numeric date, the author, and an HTML table row
describing the submission. The intended use is that the output of this
script would be fed through sort, and then edited with a text editor to
remove the three fields of sort-key data and add an HTML wrapper. The
actual result of my running the script, sorting, and editing, appears at
http://www.edifyingfellowship.org/~coroner/mylist.html. Note that that
URL isn't absolutely permanent, but if it goes down sometime and people
still want to read it, I'll re-post it elsewhere.
--
Matthew Skala
mskala at ansuz.sooke.bc.ca Embrace and defend.
http://ansuz.sooke.bc.ca/
#!/usr/bin/perl
while (<SS*/*>) {
$fn=$_;
open(FILE,$fn);
$score=0;
$author='';
$date='';
$counting=0;
$lines=0;
while (<FILE>) {
chomp;
$_.=<FILE> if /Submissions? from/;
if (/Submissions? from\s+(.*?)\s+(\S+\s+\S+\s+2001)/i) {
$author=$1;
$date=$2;
$author=~s/\s*(received.*)?(on)?$//i;
$author='(unspecified)' if $author eq '';
$counting=1;
$lines=1;
} elsif (/the extreme intellectual property/
|| /These measures, based on the US Digital Millennium/
|| /gravely chilled scientists' and computer security/
|| /prevention technologies to be bypassed/
|| /UN Universal Declaration/
|| /controversial and anti-freedom/) {
$score++;
} elsif (/Return to list of submissions/i) {
$counting=0;
}
$lines++ if $counting;
}
close(FILE);
if ($date) {
$date=~/([a-z]+) (\d+), (\d{4})/i;
($month,$day,$year)=($1,$2,$3);
$day="0$day" if $day=~/^[0-9]$/;
$month="01" if $month=~/^jan/i;
$month="02" if $month=~/^feb/i;
$month="03" if $month=~/^mar/i;
$month="04" if $month=~/^apr/i;
$month="05" if $month=~/^may/i;
$month="06" if $month=~/^jun/i;
$month="07" if $month=~/^jul/i;
$month="08" if $month=~/^aug/i;
$month="09" if $month=~/^sep/i;
$month="10" if $month=~/^oct/i;
$month="11" if $month=~/^nov/i;
$month="12" if $month=~/^dec/i;
$date="$year-$month-$day";
}
print "$score\t$date\t$author\t"
."<TR><TD>$score<TD>$date<TD>$lines"
."<TD><A HREF=\"http://strategis.ic.gc.ca/$fn\">$author</A>\n"
if $author;
}
--
For (un)subscription information, posting guidelines and
links to other related sites please see http://www.digital-copyright.ca
More information about the Discuss
mailing list