Online tools

You can also use online tools that try to convert HTML tables into XML, CSV, and JSON. Let's try an example. The NASA JPL site has a Web page containing data about the moon and the planets in our solar system (nssdc.gsfc.nasa.gov/planetary/factsheet). To use that data, you will need to have it in a standard format such as JSON, CSV, or XML, but it's only available as an HTML table, shown as follows:

An HTML table containing data that can be used in a chart

Let's first try an online conversion service. Searching for HTML-to-CSV conversion, I found an online conversion service at at www.convertcsv.com with several CSV conversion tools. Open the HTML Table to CSV link and either paste the source code in the input box, or provide its URL. There are some options you can configure, such as choosing the delimiter. Click on the Convert HTML to CSV button, and the following text will appear in the output box:

,MERCURY,VENUS,EARTH,MOON,MARS,JUPITER,SATURN,URANUS,NEPTUNE,PLUTO
Mass (1024kg),0.330,4.87,5.97,0.073,0.642,1898,568,86.8,102,0.0146
Diameter (km),4879,"12,104","12,756",3475,6792,"142,984","120,536",...,2370
Density (kg/m3),5427,5243,5514,3340,3933,1326,687,1271,1638,2095
Gravity (m/s2),3.7,8.9,9.8,1.6,3.7,23.1,9.0,8.7,11.0,0.7
... several rows not shown ...
Number of Moons,0,0,1,0,2,79,62,27,14,5
Ring System?,No,No,No,No,No,Yes,Yes,Yes,Yes,No
Global Magnetic Field?,Yes,No,Yes,No,No,Yes,Yes,Yes,Yes,Unknown
,MERCURY,VENUS,EARTH,MOON,MARS,JUPITER,SATURN,URANUS,NEPTUNE,PLUTO

This is valid CSV, but some fields were interpreted as strings, not numbers (some diameters, for example). You might also wish to remove some unnecessary rows, such as the last one, or data you don't need. You can edit the file later and write a script to fix the numbers using regular expressions. Download the result and save it in a file, and then try loading the file using JavaScript.

Since this is a third-party online service, I can't guarantee it will still exist when you read this book, but you should find similar services that perform the same conversion. If not, you can always write an extraction script yourself. A good tool for that is XPath, supported by many extraction libraries and browsers, described in the next section.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset