Given several URLs with nearly identical structure but different content, the solution needs to acquire the HTML of each URL via HTTP, extract the desired content, remove duplicates, sort it by last name and then by first name, and output valid XHTML to a file on the server. The solution must be coded in Cold Fusion but can use thrid-party solutions for cleaning the HTML input (which is NOT valid HTML). The solution should employ a single XSLT stylesheet capable of transforming the input of all of the HTML into the desired output in a single pass. This means that the HTML input needs to be converted to valid XML first. There are at least two open source tools capable of doing this (links to these products will be provided). In short, although the entire solution needs to be wrapped in Cold Fusion 8, the workhorse is the XSLT stylesheet that does the transformation.
## Deliverables
Given several URLs with nearly identical structure but different content, the solution needs to acquire the HTML of each URL via HTTP, extract the desired content, remove duplicates, sort it by last name and then by first name, and output valid XHTML to a file on the server. The solution must be coded in Cold Fusion but can use thrid-party solutions for cleaning the HTML input (which is NOT valid HTML). The solution should employ a single XSLT stylesheet capable of transforming the input of all of the HTML into the desired output in a single pass. This means that the HTML input needs to be converted to valid XML first. There are at least two open source tools capable of doing this (links to these products will be provided). In short, although the entire solution needs to be wrapped in Cold Fusion 8, the workhorse is the XSLT stylesheet that does the transformation.
The solution must log its progress to a file that can be read by a plain text editor. The log file must include a date and time stamp for each action logged. The log file must clearly describe what was intended and what actually happened.
The solution must fail gracefully. In this case, if the solution is unable to produce the desired results or an error occurs, it should output nothing (should not save the output) but log the error to the log file with samples of what was expected and what actually occurred. Also, in event of an error, and email should be sent to the system administrator with the details of the error.
The solution must be able to handle up to 10,000 records (names and addresses, basically) and the transformation must take less than 1 second.
If the selected candidate does well and has an interest in Cold Fusion programming, there will be additional work on the same site(s).
Advanced knowledge of Cold Fusion 8, Fusebox 3, Subversion, and XSLT is required.
Attached please find a sample HTML file representative of the actual HTML that needs to be transformed.
Version 1 of the solution must be delivered within 7 days of being awarded, and must be platform and server independent. In other words, it must function the same way on any other standard Cold Fusion server, assuming no third-party software mentioned above is required. The solution must NOT require anything above Cold Fusion 8.