Internet Technologies:hrf

HTML file manipulation Professional

This chapter describes HRF usage scenarios for manipulating HTML files, or documents.

Next topic

Creating a HTML document from scratch

The task is to create a HTML document from scratch, containing a HTML tag, a Body and a Paragraph tag along with some text. Let us make a "HELLO WORLD" document in Xbase++ code:

01: // HELLO.PRG 
02: #pragma Library( "HrfClass.lib" ) 
03: 
04: PROCEDURE Main 
05:   LOCAL oDocument := HTMLDocument():New() 
06:   LOCAL oTag      := HTMLHtmlElement():New( oDocument ) 
07: 
08:   oTag := HTMLBodyElement():New( oTag ) 
09:   oTag := HTMLParagraphElement():New( oTag,,, "Hello World" ) 
10: 
11:   oDocument:saveFile( "Hello.htm" ) 
12: RETURN 

The example program HELLO.PRG demonstrates the most important aspects for successfully using HRF classes, and we discuss it line by line after having a look at the resulting HTML code:

<html> 
<body> 
 <p> 
  Hello World 
 </p> 
</body> 
</html> 

The file "Hello.htm" contains well formatted HTML code including all opening and closing HTML tags, although no HTML tag is written in the PRG source code. This is the task of the various HTML..() classes being part of the HRF: they know the HTML tag they reflect and they know how to create well formatted HTML code. They even know how to produce perfectly readable HTML code which is extremely valuable for "debugging" complex HTML pages. However, let's go into details now:

Line #2

The #pragma directive is the best way to ensure that the library HrfClass.lib will be linked to the executable. This file includes the HRF classes.

Line #5

A HTMLDocument() object does not reflect a particular HTML tag, it is responsible for file handling and has methods for loading/saving HTML files.

Line #6

All HTML documents must start with the opening <html> tag and must end with the closing </html> tag. This is what the object oTag takes care of. The HTMLHtmlElement() class reflects the <html> tag and this might give you an idea of the naming convention for HRF classes. Classes reflecting HTML tags have "HTML" as prefix, followed by the tag name and "Element" at the end.

Another important aspect is that the object oDocument is passed to the :new() method. This way, oDocument becomes the parent of oTag and oTag is a child of oDocument. Just as with Xbase Parts, oTag is stored in the child-list array of oDocument, and that's why we can re-use the variable oTag when the next object is created.

Line #8

The object for the <body> tag is created by passing oTag to :new()and assigning the new object to oTag, i.e. a new level in the parent/child hierarchy is created. The object previously referenced in oTag is still available via:

oDocument:childList()[1] 

Line #9

The creation of the Paragraph object demonstrates how an HRF object obtains knowledge about the content embedded in an opening and closing HTML tag. It is the fourth parameter of the :new() method. Just keep in mind, the first parameter is always the parent and the 4th is always the content. This parameter interface for :new() is consistently used in all 55 HTML..() classes.

Line #11

The object hierarchy reflecting the HTML document is complete and we can tell the document object to create the HTML code for us and write it to a file. All we have to provide in PRG code is the file name, the rest is done by the method :saveFile().

Remarks

The example program demonstrates the recommended usage of HTML..() classes, there are other possibilities, though. For example, the same result of HELLO.PRG could be achieved with the following code:

PROCEDURE Main 
LOCAL cContent  := "<html><body><p>Hello World></p></body></html>" 
LOCAL oDocument := HTMLDocument():New( ,,, cContent ) 

oDocument:saveFile( "Hello.htm" ) 
RETURN 

The content of the HTML document is hard coded in a HTML formatted string and passed to the :new() method. This is legal code, but it has no advantage over the initial implementation since the HTML string is parsed by the HRF object generator and the same number of HRF objects are created when :new() returns. This coding style may require less typing, but it leads to the disadvantage that the well-formedness of a HTML document is left to the programmer and that HTML code is mixed with PRG code. The latter is something you should avoid, if possible. As a rule of thumb: leave HTML code in HTML files and PRG code in PRG files. The HRF connects both.

Useful links

HTML tags and HRF classes

HTMLDocument() class reference

Next topic

Changing the background color of a HTML page

After we have seen how to create a HTML document from scratch, we will use the created file "Hello.htm" to demonstrate the next technique: manipulating HTML tag attributes. Instance variables exist for this purpose in HRF classes and we are going to change the background color of the HTML page defined in "Hello.htm". This is how it works:

01: // HELLO1.PRG 
02: #pragma Library( "HrfClass.lib" ) 
03: 
04: PROCEDURE Main 
05:    LOCAL oDocument := HTMLDocument():loadFile( "Hello.htm" ) 
06: 
07:    oDocument:body:bgColor := "Yellow" 
08:    oDocument:saveFile( "Hello1.htm" ) 
09: RETURN 

The HTML document object is created in line #5 by the :loadFile()method. This is one of the most useful methods of the HTMLDocument() class since it reads an entire HTML file, parses it and creates a tree structure of HRF objects reflecting the file. All objects are accessible via the :childList() array, but some of the HRF objects are also referenced in instance variables.

For example, the object reflecting the <body> tag is accessible via oDocument:body (line #7), i.e. the :body instance variable references a HTMLBodyElement object. An attribute of <body> is "bgColor" which is also the name of an instance variable of the HTMLBodyElement object. By assigning a string to :bgColor in line #7, we simply change the value of this tag attribute. As a result, the HTML code stored in "Hello1.htm" (line #8) looks like this:

<html> 
<body bgColor="Yellow"> 
 <p> 
  Hello World 
 </p> 
</body> 
</html> 

To change the attribute of a HTML tag is really simple: just assign the value for the tag to the instance variable reflecting the attribute. That's all. However, in some rare cases, the DOM standard does not define all instance variables reflecting all attributes of a HTML tag. If you encounter such a situation, you can use the following approach:

oDocument:body:setAttribute( "bgColor", "Yellow" ) 

The :setAttribute() method is implemented in the "root" class of HRF. It assigns a value to a HTML attribute and/or adds the attribute if it does not exist. The name of the "root" class is HRFTag() and it includes methods which are quite handy in daily programming but may not be defined in the DOM standard.

Useful links

HTML tags and HRF classes

HRFTag() class reference

HTMLDocument() class reference

Next topic

Changing the URLs of all image files

The occasional requirement to re-structure the directories of a Web site is a nightmare for Web administrators when HTML files refer to other documents that are located in other directories. In this case, all hyper links must be adjusted in HTML files so that all links remain intact after the directories are changed.

This is the scenario for the task we are going to discuss now, and how it can be solved using HRF classes. The major problem is: how to change URLs in HTML files that link to other documents. Assume that all image files (GIF, JPG) of a Web site are stored in the directory "..\images" and that there is a need to store GIF and JPG files in two separate directories.

A Web administrator can easily separate GIF and JPG files on the Web server using a low level "move" on the command line:

move  ..\images\*.gif  ..\GifImages 
move  ..\images\*.jpg  ..\JpgImages 

This action, however, results in broken links for all HTML documents in the Web site which refererence an image file in the "..\images" directory. As a consequence, all HTML documents must be changed to reflect the new directory structure. Let us see now how this can be accomplished using HRF classes:

01: PROCEDURE Main 
02:    LOCAL aFiles  := Directory( "*.htm" ) 
03:    LOCAL i, imax := Len( aFiles ) 
04:    LOCAL j, jmax 
05:    LOCAL oDocument, aImages, cURL 
06: 
07:    FOR i:=1 TO imax 
08:       oDocument := HTMLDocument():loadFile( aFiles[i,1] ) 
09:       aImages   := oDocument:images 
10:       jmax      := Len( aImages ) 
11: 
12:       FOR j:=1 TO jmax 
13:          cURL := Upper( aImages[i]:src ) 
14: 
15:          IF "GIF" $ cURL 
16:             cURL := StrTran( cURL, "/IMAGES", "/GifImages" ) 
17:          ELSE 
18:             cURL := StrTran( cURL, "/IMAGES", "/JpgImages" ) 
19:          ENDIF 
20: 
21:          aImages[i]:src := cURL 
22:       NEXT 
23: 
24:       oDocument:saveFile( aFiles[i,1] ) 
25:    NEXT 
26: RETURN 

This program creates an array of HTML files (line #2) and iterates through this array. Each HTML file is loaded in line #8, and the image files referenced in the current HTML file are obtained in line #9. The instance variable :images of the HTML document object holds an array which references in each element a HTMLImageElement() object reflecting the <img> tag. The URL of an image file is the value of the attribute SRC, and this is obtained in line #13. The URL is changed in the program by the StrTran() function (line #16 and #18), and is assigned to :src (line #21) before the HTML file is written back to disk (line #24).

This example program clearly demonstrates the power of HRF classes since it is possible to update all URLs for image files in any number of HTML files using only 26 lines of Xbase++ code. The key logic for the program is provided by a single instance variable of the HTMLDocument class: :images. It contains an array which holds references to all HTMLImageElement() objects, i.e. this array can be used to modify all <img> tags contained in a HTML file.

The :images instance variable is just one example to demonstrate that you can access a collection of same HTML tags contained in a HTML file. There are more instance variables containing arrays of HRF objects. The complete list is: :anchors, :applets, :forms, :imagesand :links. Each array contains objects of the same HRF class.

Useful links

HTMLAnchorElement() class reference

HTMLAppletElement() class reference

HTMLFormElement() class reference

HTMLImageElement() class reference

HTMLLinkElement() class reference

Next topic

Feedback

If you see anything in the documentation that is not correct, does not match your experience with the particular feature or requires further clarification, please use this form to report a documentation issue.