We had the necessity -- what we needed was invention! The release deadline loomed, and our strategy hit a wall for controlling content conditionally in special HTML files that were external to Doc-To-Help. Our whole approach was in jeopardy, and it had to work.
Easy enough to fix if I could run regular expressions against these non-standard files, but I had never crossed that bridge and was out of time. My developers all did their regex work from inside of their programming IDEs, so no one could think of a way to do such tasks without a framework.
I realized that, essentially, what we writers needed was a way to do search and replace against sets of HTML files, from batch files, so that we could incorporate it into our documentation build process. (And have it be free and easy, of course.) After much thrashing, I googled up a post that offered a small Visual Basic Script, which Windows can run without my installing anything and which can be used generically, by running it from a batch file with parameters. A batch file? THAT I could handle!
Here is the little script, which you need to save in a file ending .VBS:
Const ForReading = 1
Const ForWriting = 2
strFileName = Wscript.Arguments(0)
strOldText = Wscript.Arguments(1)
strNewText = Wscript.Arguments(2)
Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objFile = objFSO.OpenTextFile(strFileName, ForReading)
strText = objFile.ReadAll
objFile.Close
strNewText = Replace(strText, strOldText, strNewText)
Set objFile = objFSO.OpenTextFile(strFileName, ForWriting)
objFile.WriteLine strNewText
objFile.Close
Save the file, and you never have to look at it again. To use it, you run it with parameters, at the command line (*.BAT file): call the script, tell it what file to process, and give it strings to find and replace.
cscript ReplaceText.vbs "C:\path\foo.txt" "old text" "new text"
So, I was able to remove the unneeded <html> and <body> tags from the files as simply as this:
cscript C:\path\ReplaceText.vbs "C:\path\foo.htm" "<html>" ""
cscript C:\path\ReplaceText.vbs "C:\path\foo.htm" "</html>" ""
cscript C:\path\ReplaceText.vbs "C:\path\foo.htm" "<body>" ""
cscript C:\path\ReplaceText.vbs "C:\path\foo.htm" "</body>" ""
But how do I get rid of the <head> section, which had variable content in it? I realized that I could convert those tags to comments, which would hide the variable mess nicely:
cscript C:\path\ReplaceText.vbs "C:\path\foo.htm" "<head>" "<!--"
cscript C:\path\ReplaceText.vbs "C:\path\foo.htm" "</head>" "-->"
Very good, but how do I process an entire directory of files? Ah, that's a DOS trick, to use variables and recursive behavior. This reads, For each HTM file in this directory, do this VBS replace command.
FOR /R "C:\path\" %%g IN (*.htm) DO cscript C:\path\ReplaceText.vbs "%%g" "<html>" ""
FOR /R "C:\path\" %%g IN (*.htm) DO cscript C:\path\ReplaceText.vbs "%%g" "</html>" ""
FOR /R "C:\path\" %%g IN (*.htm) DO cscript C:\path\ReplaceText.vbs "%%g" "<head>" "<!--"
FOR /R "C:\path\" %%g IN (*.htm) DO cscript C:\path\ReplaceText.vbs "%%g" "</head>" "-->"
FOR /R "C:\path\" %%g IN (*.htm) DO cscript C:\path\ReplaceText.vbs "%%g" "<body>" ""
FOR /R "C:\path\" %%g IN (*.htm) DO cscript C:\path\ReplaceText.vbs "%%g" "</body>" ""
Now I had a general solution to cleaning up the HTML contents (albeit quite slow to run), but I had another problem: I needed to reform this source into kosher HTML that D2H could use in builds.
I realized that I might as well merge all of the HTML files into one file, for all Doc-To-Help cared. Doing that was quite easy, using the DOS Copy command to add every file to a new file of my choosing. If you've never done it, you'll be surprised at its simplicity:
copy C:\path\*.htm C:\path\merged.htm
Once I merged them, I had a Doh! moment: why not do all my search and replace on the merged file? It turned out to be infinitely faster. Out came all the recursive statements. Good. (Rebuilds take long enough, thank you!) Now all I had left was how to turn this merged mess into a proper HTML source file, which I did by writing the beginning and the ending fragments of the HTML file out to standalone files, which I could tack onto my merged file after it was done:
copy C:\path\head.htm+C:\path\merged.htm+C:\path\end.htm C:\path\final.htm
This final.htm file was proper HTML, so I could link it into my Doc-To-Help project, no problem. The result? All of our conditional tagging resolved correctly in the outputs, thank heavens. Even better, this source file is recreated on the fly with every build very quickly: completely automated. This was a big improvement over my original implementation, in which I referenced these HTML files using a Word container file and Word fields:
{ INCLUDETEXT C:\\path\\foo.htm }
This approach would have been just fine if the set of HTML files were stable, but ours changed regularly: name changes, additions, moves, deletions. With the Word implementation, that was manual work to keep up with it, but this new HTML-only method is completely hands-free. Did I mention that automation is one of my favorite words?
I hope this little script can solve a documentation problem for you!



