Saturday 5 June 2010

Perl text document stats script

Crikey, that was a little easy, in less than a fortnight I've done a cover version of my Python script from here, now in Perl, all this from not knowing any Perl at all.

I didn't do it on my own mind, I had a textbook, Learning Perl from O'Reilly, and tutoring from Robbie and Matt off of Facebook, without their help, I'd be floored.

The program in perl that analyzes the text file of my 2001 novel Shag Times and comes up with various statistics for it.

Its a bit messy compared to the Python version, and its clunky too, there's gotta be simpler and more leet ways of doing many of the sections.
#!/usr/bin/perl

# Program to
# open shagtimes
# provide a word count
# count unique words
# provide top ten most popular words
# provide all single occurance words
# calculate average word length
# find longest word

use Text::Wrap;

sub hashValueDescendingNum {
$occurance_list{$b} <=> $occurance_list{$a};
}

# Opening Shag Times and processing it a bit
$filename = "shagtimes.txt";
open BOOK, "<", $filename or die "Can't open '$filename': $!"; my @book = ;
close BOOK;
foreach $book (@book) {
$allbook .= $book;
$allbook .= " "};
$allbook = lc"$allbook";
# $allbook is a long string of Shag Times

# Code to remove full stops and commas and stuff
$regexp = '[\W]' ;
@book = split /\s+/, $allbook;
foreach $book (@book) {
$book =~ s/$regexp//g };
print "\n=======================================\n";

# Doing the word count
foreach $book (@book) {
$wordcount += 1 };
print "The document contains $wordcount words in total\n";
print "=======================================\n";

# Doing the unique count
foreach $book (@book) {
$was_this_in_uniques = "no";
foreach $uniques ( @uniques ) {
if ($book eq $uniques){
$was_this_in_uniques = "yes"}};
if ($was_this_in_uniques eq "no") {
push @uniques, $book}};
foreach $uniques (@uniques) {
$uniquecount += 1 };
print "The document contains $uniquecount unique words\n";
print "=======================================\n";

# Finding top ten popular words
print "The top ten most used words:-\n";
foreach $uniques ( @uniques ) {
$was_this_in_book = "no";
foreach $book ( @book ) {
if ($uniques eq $book) {
$occurances += 1}};
$occurance_list{$uniques} = $occurances;
$occurances = 0 };
foreach $key (sort hashValueDescendingNum (keys(%occurance_list))) {
push @ranked_occurances, "$key \($occurance_list{$key}\)"};
foreach (0..9) {
print "$ranked_occurances[$_]\n"};
print "=======================================\n";

# Finding single use words
print "Words that were used only once:-\n";
foreach $ranked_occurances ( @ranked_occurances ) {
$_ = $ranked_occurances;
if (/\(1\)/) {
s/\s\(1\)//;
push @singles, ("$_");
$single_use_count += 1}};
@singles = sort @singles;
foreach $singles ( @singles ) {
$single_paragraph .= "$singles, "};
print wrap("", "", "$single_paragraph\n");
print "=======================================\n";
print "A total of $single_use_count words were used only once\n";
print "=======================================\n";

# Finding average word length
foreach $book (@book) {
$chartotal += length($book) };
$avechar = $chartotal/$wordcount;
my $printy_avechar = sprintf "%.3f", $avechar;
print "The average word length was $printy_avechar letters long\n";
print "=======================================\n";

# Finding longest word
foreach $uniques ( @uniques ) {
if (length($uniques) > $longlength) {
$longlength = length($uniques)}};
print "The longest word was $longlength letters long\n";
print "These words were that long:-\n";
foreach $uniques ( @uniques ) {
if (length($uniques) == $longlength) {
print "$uniques\n"}};
print "=======================================\n";

Now I need to order that Ruby book, Ruby on the Rails for Dummies perhaps?

No comments:

Post a Comment