Last week I received an e-mail from someone who wanted to extract all <abbr> and <acronym> tags from a website. He wanted to put all these into a kind of glossary. Ideally he would want this tool to work from within Dreamweaver.
I thought this kind of a problem was more suited to Perl so I went ahead and wrote a script. The script works fine on my OS X machine and I see no reason why it wouldn't work on any other *nix system.
There's a couple of things to know before using this script:
1. You will have to point the script to your website directory (line 29).
2. There should be no files called "list.txt", "abbr-list.txt" and/or "abbreviations.txt" in the siteroot. These will be overwritten!!!
3. Final output is to the file "abbreviations.txt", it will list all <abbr> and <acronym> tags, which you can then paste into a regular HTML file.
4. The default for this script is to extract all tags from html files, you can change the extension to search for (in line 36).
5. There's a secondary perlscript used that removes duplicates and sorts the output. This file came with BBEdit and is called kill_dups_and_sort.pl. It is included in the download.
Download the script.
See the output.