DBpedia DBpedia / Items

WebTables: Exploring the Power of Tables on the Web

Get Feed
WebTables: Exploring the Power of Tables on the Web
Original URL

Comments

  • Public Comments

    • 6 months ago


      How Google Squared works
      Google, WolframAlpha
    • 6 months ago


      Work done while all authors were at Google, Inc. Probably the team behind Google Squared. This article explains the construction of the "attribute correlation statistics database":

      "We extracted 14.1 billion HTML tables from Google’s general-purpose web crawl, and used statistical classification techniques to find the estimated 154M that contain high-quality relational data."

      "The WebTables system is the first large-scale attempt to extract and leverage the relational information embedded in HTML tables on the Web. We described how to support effective search on a massive collection of tables and demonstrated that current search engines do not support such search effectively. Finally, we showed that the recovered relations can be used to create what we believe is a very valuable data resource, the attribute correlation statistics database."

      [Future work:] "We would like to also include relational data derived from more than just HTML tables. Potential data sources that researchers have studied include tabular layouts that do not use the table tag, deep web databases, socially-tagged data items, HTML-embedded lists, and natural language text."

      I think this is the most significant exercise that has been done so far to extract and leverage the structured information available in the web's HTML tables. DBpedia and Freebase did that too of course, but they concentrated on Wikipedia infoboxes, which represent only a very tiny subset of the available web tabular data.

      The more I look at Google Squared, the more I like it. I really wish Nova would explain us in more details his hunch that this is not interesting stuff...
      WolframAlpha
    Add a Comment
Report This

Twine is about discovering, collecting and sharing the content that interests you. Learn More

Join Twine

Stats

First Posted By

First Comment By

Who's Interested In This?

Forgot your password?