Page 1 of 1

The best way to code "htmlElementcount.sh"

PostPosted: Mon Mar 03, 2008 3:32 pm
by phil
In place of reading each line of the access log you should use the shell commands and egrep tool to do the hard work for you!

For performance use mixed upper and lower case searches e.g. [Hh] as this is quicker than using '-i'.

So we would code...

Code: Select all
#!/bin/bash

#
# htmlElementcount.sh


function die {
   echo "$*" >&2 ; exit 1
}

[ 1 -ne $# ] && die "usage: $(basename $0) LOG_FILE"
[ ! -e  $1 ] && die "LOG_FILE [$1] does not exist"
[ ! -s  $1 ] && die "LOG_FILE [$1] is empty"

cat <<EOF
Page Hits
$(wc -l < $1 | tr -d ' ') pages accessed - Form Elements Processed:
$(egrep -cq '[.][Hh][Tt][Mm][Ll]' $1| tr -d ' ') html pages accessed
$(egrep -cq '[.][Gg][Ii][Ff]' $1| tr -d ' ') GIF files accessed
$(egrep -cq '[.][Jj][Pp][Gg]' $1| tr -d ' ') jpg files accessed
EOF

exit 0

PostPosted: Sat Mar 08, 2008 2:33 am
by laum
Hey Phil,


Thanks for the tips. I never thought about using the range operator instead of using e|grep's -i option for speed, but it makes sense since you're letting the shell do the work for you.

Sorry I didn't respond to you sooner. I just transfered my comcast service from one location to another and they "comcastically" gave me a new email address and I had to wait a week for them to get all the emails, etc, that they "lost" during that time period.

Appreciate the suggestion and thanks for taking the time to help out. I promise I won't use your better-method in our blog without crediting you, unless you don't want it out there at all with your name on it. Let me know; it would make a great follow-up post and I like to give attribution where I can (maybe promo a link to your site or something, rather than giving out your email). In any event, I won't post any of your info if you don't say it's okay.

Thanks, again :)

, Mike

PostPosted: Sat Mar 08, 2008 5:06 am
by phil
No problem you're welcome to 'use' the post

PostPosted: Sat Mar 08, 2008 11:43 pm
by laum
Hey Phil,

Thanks a lot :) I'll give propers to your username :)

Cheers,

, Mike

PostPosted: Mon Mar 10, 2008 2:02 pm
by laum
Hey Phil,

I just got a chance to check out your code and it looks good.

I just needed to alter the:

egrep -cq

to be

egrep -c

So the counts would print. I was using q before because I was checking errno after and counting outside of that . Like you said, my script was doing a lot of extra work ;)

Definitely great work, though - check out these timings

my original brutish script: real 0m3.358s
your streamlined script : real 0m0.108s

Those 3 seconds would add up - I was only using a 500 line access file!

Good stuff. Thanks, again :)

, Mike

PostPosted: Fri Apr 18, 2008 10:57 pm
by yahoozer
Good solution. Thank you!

Yaz