Web 3.0 - Semantic Web
- Share
Comments
- threads
- flat
by
Nova Spivack
5 months ago
The author of that blog seems to have spent quite a lot of time "not minding" the Semantic Web. :^)
by
Mark Davey
5 months ago
And to say that none can "coherently explain" semantic web, is a lack of research imho. Still, like most things without a set standard it is difficult to predict where a company should place there technological resources, unless for the time being you place your focus on developing the ontologies for multiple deployments.
by
hucheng
5 months ago
Whilst it is easy to pick holes in this post, I can understand the frustrations expressed.
Marking things up for meaning (whatever format) is great and gives a little bit more utility but is surely never going to achieve what some people seem to promise/suggest any more than jumping up and down a lot can be considered flying. The true meaning behind what is written on the internet and the subtle connections between things can't even be consistently parsed by two human individuals, let alone a bunch of markup or meta-data.
The real deal killer, in my opinion though is that so much of value is a conversation, and just like real people (even educated ones) don't talk in grammatically perfect sentences, they are not going to take the trouble to mark up their online conversations to be semantically parsed. An AI that can actually read in a similar way that we do just might though.
Marking things up for meaning (whatever format) is great and gives a little bit more utility but is surely never going to achieve what some people seem to promise/suggest any more than jumping up and down a lot can be considered flying. The true meaning behind what is written on the internet and the subtle connections between things can't even be consistently parsed by two human individuals, let alone a bunch of markup or meta-data.
The real deal killer, in my opinion though is that so much of value is a conversation, and just like real people (even educated ones) don't talk in grammatically perfect sentences, they are not going to take the trouble to mark up their online conversations to be semantically parsed. An AI that can actually read in a similar way that we do just might though.
25
by
Bent Rasmussen
5 months ago
They don't have to. Developers have to create forms that map into ontologies. And for the rest there's NLP. I think Powerset has a good example of parsing an otherwise incomprehensible scattered sentence by a miss "whatevercity". I'd say it's comming along fine... But those who bet wisely will win long-term. I'm confident the time has come for these technologies. Twine is still formulating it's ontology (I gather) but once it's v1.0, and Twine opens up, interoperability will ensue - and interestingly enough - unexpected interoperability may as well.
I don't get what this scepticism buys anyone. Sure, using ontologies by itself will not be a magic bullet. A lot of logic and rules will have to be built on top of it, but it does have very good interop characteristics almost per default.
I don't get what this scepticism buys anyone. Sure, using ontologies by itself will not be a magic bullet. A lot of logic and rules will have to be built on top of it, but it does have very good interop characteristics almost per default.
by
hucheng
5 months ago
Skepticism buys us balance the overly skeptical offset those who oversell the benefits whilst the rest of us lay at points in between.
I think what is happening is great and a big leap forward, but true NLP is the holy grail, in fact NLU (natural language understanding) would be better by far.
Even if you get some good ontologies can you imagine people entering comments in to complicated form.
Powerset offers a nice addition to searching and some useful discovery elements but after considerable playing it clear it has a long long way to go. For example I asked it "What languages does Anthony Burgess speak?". The top result in Powerset looked very like the third in Google but bear in mind that Powerset is only limited to Wikipedia at the moment. Neither of them simply returned a list of languages or diplomatically pointed out that these are the languages he spoke rather than speaks as he has passed away.
I do have rather high expectations, but I remember getting excited when text based adventure games came out, games like the Hobbit, it seemed for a moment that you could actually speak to the computer. But after Thorin had sat and sang of gold once too often, you woke up and smelt the coffee.
I think what is happening is great and a big leap forward, but true NLP is the holy grail, in fact NLU (natural language understanding) would be better by far.
Even if you get some good ontologies can you imagine people entering comments in to complicated form.
Powerset offers a nice addition to searching and some useful discovery elements but after considerable playing it clear it has a long long way to go. For example I asked it "What languages does Anthony Burgess speak?". The top result in Powerset looked very like the third in Google but bear in mind that Powerset is only limited to Wikipedia at the moment. Neither of them simply returned a list of languages or diplomatically pointed out that these are the languages he spoke rather than speaks as he has passed away.
I do have rather high expectations, but I remember getting excited when text based adventure games came out, games like the Hobbit, it seemed for a moment that you could actually speak to the computer. But after Thorin had sat and sang of gold once too often, you woke up and smelt the coffee.
by
Dean Allemang
5 months ago
As someone who just published a book with the words "Semantic Web" in the title, I nevertheless actually agree with a lot of what Glenn has to say. I actually welcome this sort of 'criticism' of the Semantic Web, in that it points out the directions that we semantic weenies are drawn to, even if they don't really help the industry along. So rather than 'pick apart' the post, I'd like to point to the bits that I agree with.
1. Glenn left out one crowd of people who are left behind by the word "Semantic", and this (in my experience) is by far the biggest crowd. Their attitude can be summed up by the single retort, "I'm a McAfee user, myself." And even though this isn't a joke, it always gets a laugh, opening up the discussion to what the Semantic Web really is.
5&6. Tagging and blogging aren't good examples of the semantic web, and we shouldn't pretend they are. But they are good examples of grassroots efforts to do something semantic web -like, and they provide a good window into what the common web citizen feels is missing.
7. I agree with this point (powerset notwithstanding). The Semantic Web has a viable value proposition, even without bringing in anything AI. It's great that someone is doing AI, and the Semantic Web can use that stuff too. But it doesn't need it to provide real business value.
9. I have always worried that this unusual, mathematical use of the word "graph" would throw my readers and students for a loop. Oddly, that has never happened. I don't know why not - my intuitions agree with Glenn. But my intuitions have proved to be wrong.
11. Like Glenn says, metadata is all relative, which is why having a representation in which metadata uses the same infrastructure (including query) as data is really powerful. Especially when talking to someone who doesn't know what metadata is, or even more so, doesn't care.
I can't resist taking issue with one comment - that nobody can coherently explain what the Semantic Web is. I will admit that the sum of all the stuff that people say in public about it is incoherent (so what else is new?). But I humbly submit that Jim and I have a coherent explanation in our book. Enough for the plug.
1. Glenn left out one crowd of people who are left behind by the word "Semantic", and this (in my experience) is by far the biggest crowd. Their attitude can be summed up by the single retort, "I'm a McAfee user, myself." And even though this isn't a joke, it always gets a laugh, opening up the discussion to what the Semantic Web really is.
5&6. Tagging and blogging aren't good examples of the semantic web, and we shouldn't pretend they are. But they are good examples of grassroots efforts to do something semantic web -like, and they provide a good window into what the common web citizen feels is missing.
7. I agree with this point (powerset notwithstanding). The Semantic Web has a viable value proposition, even without bringing in anything AI. It's great that someone is doing AI, and the Semantic Web can use that stuff too. But it doesn't need it to provide real business value.
9. I have always worried that this unusual, mathematical use of the word "graph" would throw my readers and students for a loop. Oddly, that has never happened. I don't know why not - my intuitions agree with Glenn. But my intuitions have proved to be wrong.
11. Like Glenn says, metadata is all relative, which is why having a representation in which metadata uses the same infrastructure (including query) as data is really powerful. Especially when talking to someone who doesn't know what metadata is, or even more so, doesn't care.
I can't resist taking issue with one comment - that nobody can coherently explain what the Semantic Web is. I will admit that the sum of all the stuff that people say in public about it is incoherent (so what else is new?). But I humbly submit that Jim and I have a coherent explanation in our book. Enough for the plug.
30
by
Kurt Laitner
5 months ago
what no free quote from the book to guide us on our symantec way?
by
Mills Davis
5 months ago
I have to admit, I no longer really worry about someone who critiques "semantic web." Dean and Jim have done an elegant job for "practicioners." Proof cases for the masses, however, are the only things that really matter. Just show me something done with semantic technologies that I care about. The rest is easy.
15
by
Jack D. Logan
5 months ago
To the author of that blog, I say, "Deal With It, and watch what happens!"
by
glenn mcdonald
5 months ago
Just to be clear, I'm a designer working on a not-yet-public data-exploration system, so I'm an insider, too, and obviously my post was intended for insiders, and intended in the spirit of exasperated provocation. (Although I decided at the last moment not to title it "Fuck the Semantic Web", which would have been more provocative, and maybe better.) I care intensely about how we get machines to do a better job of helping humans understand information. That, to me, is the right problem. And the big leap forward, which to me is embracing the connectedness (i.e., graph structure) of data in everything from modeling to inquiry to exploration, seems like such a pressing and ultimately fairly simple problem that I basically can't bear that we're letting anything prevent or distract us from just solving it. To me AI is a distraction (yes, including the versions in Twine and Powerset), six flavors of OWL is a distraction, Cyc is a distraction, bad terminology is a distraction, N3-vs-RDF/X(H)TML is a distraction, Kingsley saying that RDF obviates the need for Powerpoint is a distraction. None of this is helping us advanced the state of information technology as fast as we could be. None of it is necessary, and worse, none of it is sufficient.
So yeah, "Deal with it, and watch what happens". I deal with it for a living, as I assume many of you do (which is why I cross-posted my rant here), and I'm trying to help make it happen. As a human I want it to happen just as desperately and deeply as Nova said in "Why Twine Interests Me". It's precisely because I want the solution so badly that I want to wrest the problem-definition away from the scientists and ontologists and be able to formulate it in the most straightforward, feasible, unacademic way...
So yeah, "Deal with it, and watch what happens". I deal with it for a living, as I assume many of you do (which is why I cross-posted my rant here), and I'm trying to help make it happen. As a human I want it to happen just as desperately and deeply as Nova said in "Why Twine Interests Me". It's precisely because I want the solution so badly that I want to wrest the problem-definition away from the scientists and ontologists and be able to formulate it in the most straightforward, feasible, unacademic way...
by
Ryan Riley
5 months ago
"It's precisely because I want the solution so badly that I want to wrest the problem-definition away from the scientists and ontologists and be able to formulate it in the most straightforward, feasible, unacademic way..."
I completely agree, though I respect what the scientists and ontologists are trying to do. I also agree with your list of options as nothing but distractions preventing us from arriving at a solution. I'm almost sorry we don't have some version of "Beta vs. VHS", "HD-DVD vs. Blu-Ray", or "Netscape vs. IE" rather than the almost purely theoretical battles between the current contenders. At least some option would eventually win out with actual implementations. With these theoretical battles and one-off proofs-of-concept, all we get is delay and confusion.
I'm really glad that Twine, Powerset, Yahoo!'s SearchMonkey, etc. are giving us practical implementations so that we can finally move at least a few steps forward. It may be slower than desired, but it's progress. We have to at least be happy about that.
I completely agree, though I respect what the scientists and ontologists are trying to do. I also agree with your list of options as nothing but distractions preventing us from arriving at a solution. I'm almost sorry we don't have some version of "Beta vs. VHS", "HD-DVD vs. Blu-Ray", or "Netscape vs. IE" rather than the almost purely theoretical battles between the current contenders. At least some option would eventually win out with actual implementations. With these theoretical battles and one-off proofs-of-concept, all we get is delay and confusion.
I'm really glad that Twine, Powerset, Yahoo!'s SearchMonkey, etc. are giving us practical implementations so that we can finally move at least a few steps forward. It may be slower than desired, but it's progress. We have to at least be happy about that.
32
by
Daniel Feldman
5 months ago
Glenn, I am not an insider, however, in a past incarnation I may have been a bit closer as a digital signal processing practitioner. Nonetheless, I truly appreciate your goals and your frustrations. I am staunchly against acronyms and obscure meaningless terminology in all its forms, including for the insiders. There is nothing wrong with the word Semantic-- it is completely understandable to any educated person, but N3-vs-RDF/X(H)TML is clearly a distraction. It's not that insiders cannot figure these terms out, but that it is a waste of energy and bandwidth to do so, and it forces the players to converse in languages that begin to approximate machine code, this is not helpful. Machines should aid people IDEALLY -- people should not bend their natural language to approximate that of a machine. I have worked in enough industries to have encountered enough acronyms with the same letters but with totally different meanings to just plain get frustrated at their use by insiders. The frustrated radical part of me (a rather small part) thinks users of acronyms should be shot. They do not increase understanding, they only increase the fog.
I have greatly appreciated the work of Richard Feynman ever since I learned of him when "Genius" first came out. While his singularly superior intelligence was spectacular, his most admirable and priceless gift was his ability to describe accurately, clearly and simply the most complicated and counter-intuitive aspects of physical reality in a language which almost any bright person could easily understand. His communication abilities enabled him to lead men, cooperate with colleagues and educate thousands, if not millions, of curious students the world over. He was radically opposed to stilted, unnatural, unclear language, and felt that science and engineering suffered because of it. Clear and engaging language encourages participants and stimulates discussion. Overly stilted academic language only invites the most dedicated into the discussion. The more the proponents clarify the discourse, the more easily they will attract the brightest minds to invest their time and energy on the relevant problems.
When working for Thinking Machines, Feynman once told a colleague something to the effect of, "Don't tell them it will locate the local minima. Tell them if you shake it, the balls will find the valleys."
Before my father started studying Electrical Engineering he was put into the Nuclear Engineering track. He once told me that if he had had professors like Feynman he may have stuck with nuclear engineering. Engaging, passionate and easy to understand communication is critical. Scientists, application producers and users should not have to beef up on too much terminology.
I have greatly appreciated the work of Richard Feynman ever since I learned of him when "Genius" first came out. While his singularly superior intelligence was spectacular, his most admirable and priceless gift was his ability to describe accurately, clearly and simply the most complicated and counter-intuitive aspects of physical reality in a language which almost any bright person could easily understand. His communication abilities enabled him to lead men, cooperate with colleagues and educate thousands, if not millions, of curious students the world over. He was radically opposed to stilted, unnatural, unclear language, and felt that science and engineering suffered because of it. Clear and engaging language encourages participants and stimulates discussion. Overly stilted academic language only invites the most dedicated into the discussion. The more the proponents clarify the discourse, the more easily they will attract the brightest minds to invest their time and energy on the relevant problems.
When working for Thinking Machines, Feynman once told a colleague something to the effect of, "Don't tell them it will locate the local minima. Tell them if you shake it, the balls will find the valleys."
Before my father started studying Electrical Engineering he was put into the Nuclear Engineering track. He once told me that if he had had professors like Feynman he may have stuck with nuclear engineering. Engaging, passionate and easy to understand communication is critical. Scientists, application producers and users should not have to beef up on too much terminology.
by
Ryan Riley
5 months ago
While much of that is true, n3, html, rdf, etc. are all valid names given to various formats and technologies. They aren't really scientific terms like "minima", they are names, and there is no way to have a really meaningful conversations if you use either full definitions for everything (hence the reason for giving a name in the first place) or using analogies for everything. Giving an analogy to explain what something is meant to do is a great teaching tool and a great way to make sure everyone is on the same page, but not each and every time the conversation comes back to those terms.
It's more a case of "know your audience." People are unfortunately afraid of new and/or big terms and shouldn't be. There is a cost to entering a discussion. If someone doesn't understand a term, they should ask or look it up. I don't think that using some of these terms is really so much to ask. After all, nuclear engineering is still called nuclear engineering, even though those terms aren't the easiest terms to understand.
It's more a case of "know your audience." People are unfortunately afraid of new and/or big terms and shouldn't be. There is a cost to entering a discussion. If someone doesn't understand a term, they should ask or look it up. I don't think that using some of these terms is really so much to ask. After all, nuclear engineering is still called nuclear engineering, even though those terms aren't the easiest terms to understand.
by
hucheng
5 months ago
It is somewhat arrogant to assume that people should understand your terminology when there is so much in the world to choose to learn. Things should just work, and the root ideas behind them should be explainable in plain old language.
A few years ago semantic markup just meant making a html heading use a heading tag, rather than a span tag with fancy styling etc.. Big deal, now we know it is a heading but have no idea what it means. Well it was a big deal in a few areas because screen-readers and other software can now start to guess how best to navigate around a page etc. Now of course "semantic web" is meant to mean something more.
I would hope that in a few years the various semantic type approaches that are around today will seem just as trivial, or maybe even unnecessary because I don't think they are really taking us much further (relative to the end goal) than those html tags to solving "how we get machines to do a better job of helping humans understand information". Spending too much time fiddling with them might even take our eye of the ultimate goal, if they have a purpose then near enough is good enough maybe to fulfill that purpose.
I feel that to answer "how we get machines to do a better job of helping humans understand information", the machines have to some extent understand it themselves (and I don't just mean a bunch of labels that been tapped in by a human). That doesn't have to be AI though (or even real intelligence it escapes me why an intelligent machine would be artificially intelligent any more than and we say an airplane artificially flies).
The machines can fake understanding, so long as as we find it hard to tell the difference. The hard to tell the difference thing is crucial to me, because at the moment no matter what the demo the technology or example It is still very very very easy to tell the difference. Getting computers to play chess very well didn't mean that they were intelligent, but if you want a decent game of chess and there are no humans around that doesn't matter so much (they still only play a mediocre game of Go though).
I hope that is the kind of thing Glenn means, that for me is when it is going to get really exciting :).
The other problem with complex terminology it seems is that sometimes being deeply conversant with the terminology is mistaken for deep understanding of the ideas behind it, a problem in every field I guess.
A few years ago semantic markup just meant making a html heading use a heading tag, rather than a span tag with fancy styling etc.. Big deal, now we know it is a heading but have no idea what it means. Well it was a big deal in a few areas because screen-readers and other software can now start to guess how best to navigate around a page etc. Now of course "semantic web" is meant to mean something more.
I would hope that in a few years the various semantic type approaches that are around today will seem just as trivial, or maybe even unnecessary because I don't think they are really taking us much further (relative to the end goal) than those html tags to solving "how we get machines to do a better job of helping humans understand information". Spending too much time fiddling with them might even take our eye of the ultimate goal, if they have a purpose then near enough is good enough maybe to fulfill that purpose.
I feel that to answer "how we get machines to do a better job of helping humans understand information", the machines have to some extent understand it themselves (and I don't just mean a bunch of labels that been tapped in by a human). That doesn't have to be AI though (or even real intelligence it escapes me why an intelligent machine would be artificially intelligent any more than and we say an airplane artificially flies).
The machines can fake understanding, so long as as we find it hard to tell the difference. The hard to tell the difference thing is crucial to me, because at the moment no matter what the demo the technology or example It is still very very very easy to tell the difference. Getting computers to play chess very well didn't mean that they were intelligent, but if you want a decent game of chess and there are no humans around that doesn't matter so much (they still only play a mediocre game of Go though).
I hope that is the kind of thing Glenn means, that for me is when it is going to get really exciting :).
The other problem with complex terminology it seems is that sometimes being deeply conversant with the terminology is mistaken for deep understanding of the ideas behind it, a problem in every field I guess.
by
James Choate
5 months ago
Your line of reasoning is why I'm interested in Bayesian filters, paraconsistent logic, and some other areas that seem to be under-utilized but have a lot of potential. The RSS::Email project I've started here is focused on using Bayesian filters to parse and auto-tag and then collate items. The ability of a Bayesian filter to lay chess was one of my justifications for using this approach. The goal is to let people search for what is new or they haven't seen before and focus on integration and extrapolation.
With regard to terminology, many people confuse a name for something with that thing. They believe that because they can do some complex and allegedly 'deep' symbol manipulation it gives them some insite into the actual behaviour they're talking about. I find this to be seldom the case. Understanding is demonstrable only through action by application.
ps I have created a Go/Go-moku twine, and I'm not very good at the game :)
With regard to terminology, many people confuse a name for something with that thing. They believe that because they can do some complex and allegedly 'deep' symbol manipulation it gives them some insite into the actual behaviour they're talking about. I find this to be seldom the case. Understanding is demonstrable only through action by application.
ps I have created a Go/Go-moku twine, and I'm not very good at the game :)
by
Ryan Riley
5 months ago
"It is somewhat arrogant to assume that people should understand your terminology...."
What? How is that arrogant? If you are using terms used elsewhere, then you could expect someone to understand them. If you just made it up without explaining, then that would be arrogant. And it's always nice to explain your term if your audience would be unfamiliar to them. Know your audience. Frankly, I really hate reading definitions over and over again and appreciate someone giving me the benefit of the doubt when I'm in an audience that should know the subject matter.
"Things should just work, ...."
That would be nice, but unless you know some magic, people usually need to come together, define terms, communicate with those terms, and finally develop technology that "just work[s]." That technology may be developed on something that's simple to explain, in which case someone can then come along and give a nice, neat analogy to explain the basics of what's happening. Perhaps the real crime is that most people are talking about all the details and not sticking to the basic ideas? Of course, with so many competing formats and technologies, it's difficult to know what version of the "semantic web" is being discussed without listing some of the technologies.
"The other problem with complex terminology is that sometimes being deeply conversant with the terminology is mistaken for deep understanding of the ideas behind it, a problem in every field I guess."
That, I completely agree with, especially in this case.
What? How is that arrogant? If you are using terms used elsewhere, then you could expect someone to understand them. If you just made it up without explaining, then that would be arrogant. And it's always nice to explain your term if your audience would be unfamiliar to them. Know your audience. Frankly, I really hate reading definitions over and over again and appreciate someone giving me the benefit of the doubt when I'm in an audience that should know the subject matter.
"Things should just work, ...."
That would be nice, but unless you know some magic, people usually need to come together, define terms, communicate with those terms, and finally develop technology that "just work[s]." That technology may be developed on something that's simple to explain, in which case someone can then come along and give a nice, neat analogy to explain the basics of what's happening. Perhaps the real crime is that most people are talking about all the details and not sticking to the basic ideas? Of course, with so many competing formats and technologies, it's difficult to know what version of the "semantic web" is being discussed without listing some of the technologies.
"The other problem with complex terminology is that sometimes being deeply conversant with the terminology is mistaken for deep understanding of the ideas behind it, a problem in every field I guess."
That, I completely agree with, especially in this case.
32
by
Daniel Feldman
5 months ago
Hu Cheng, you identified the crux of the matter in your last sentence.
"The other problem with complex terminology it seems is that sometimes being deeply conversant with the terminology is mistaken for deep understanding of the ideas behind it, a problem in every field I guess."
This is precisely what Feynman's father taught him at a very young age to understand the difference between labeling something or describing something and really understanding it. Often times true understanding of certain kinds of phenomena is an elusive goal for even the most brilliant minds. Admitting what is known and what has not been understood is the first step to providing a clear introduction to a new problem. We keep talking about the Semantic Web and Web 1.0, 2.0, 3.0 ... but to an outsider, what are these terms referencing? Personally I want to know what has the Internet community already accomplished in these areas? What problems are well understood? What are the goals and why? What progress has been made in accomplishing these goals? And what problems remain? Once the community can come to an agreement on defining these, it will better mobilize and focus the collective brainpower and attention to bridge the gaps in understanding. Perhaps I am being totally naive, but is not the desire to resolve these gaps and expand the capabilities of the web quickly greater than the desire to seek personal gain in this space financially or academically? We will all gain with an Internet which better provides the information we seek.
"The other problem with complex terminology it seems is that sometimes being deeply conversant with the terminology is mistaken for deep understanding of the ideas behind it, a problem in every field I guess."
This is precisely what Feynman's father taught him at a very young age to understand the difference between labeling something or describing something and really understanding it. Often times true understanding of certain kinds of phenomena is an elusive goal for even the most brilliant minds. Admitting what is known and what has not been understood is the first step to providing a clear introduction to a new problem. We keep talking about the Semantic Web and Web 1.0, 2.0, 3.0 ... but to an outsider, what are these terms referencing? Personally I want to know what has the Internet community already accomplished in these areas? What problems are well understood? What are the goals and why? What progress has been made in accomplishing these goals? And what problems remain? Once the community can come to an agreement on defining these, it will better mobilize and focus the collective brainpower and attention to bridge the gaps in understanding. Perhaps I am being totally naive, but is not the desire to resolve these gaps and expand the capabilities of the web quickly greater than the desire to seek personal gain in this space financially or academically? We will all gain with an Internet which better provides the information we seek.
by
hucheng
5 months ago
James I will look at the Go twine for sure, I love the way that a game that seems so simple contains such complexity, I am not very good myself yet, computer Go games are more than enough of a match for me :). It also seems that plenty of people are peeking into the kind of corners that may contribute to those big breakthroughs, which is great.
Daniel sounds like I will enjoy finding out more about Feynman, I know of him but not in any great detail, my own father taught me that ignorance is not the same as stupidity and the converse, that being highly educated is not the same as being intelligent, I didn't grow up to be famous but it is one of those few things that came from him that I really cherish. I am not exactly in this field but I am a programmer who works with data and web-technologies (so it is inevitable), I am also very interested. Initially I am just trying to get my head around the basic concepts and play a little with the technologies (programming will come later). I particularly like "understand the difference between labeling something or describing something and really understanding it" because the initial technologies I have looked at seemed to be aimed at the labeling and describing of things, which is why I feel dissatisfied (I couldn't express it that well before though).
Feeling a bit down about the semantic web thing at the moment is not the same as not enjoying Twine, I am getting value from Twine already, that is good even if I have to wait some time yet before I come across something and think "now that IS semantic".
Daniel sounds like I will enjoy finding out more about Feynman, I know of him but not in any great detail, my own father taught me that ignorance is not the same as stupidity and the converse, that being highly educated is not the same as being intelligent, I didn't grow up to be famous but it is one of those few things that came from him that I really cherish. I am not exactly in this field but I am a programmer who works with data and web-technologies (so it is inevitable), I am also very interested. Initially I am just trying to get my head around the basic concepts and play a little with the technologies (programming will come later). I particularly like "understand the difference between labeling something or describing something and really understanding it" because the initial technologies I have looked at seemed to be aimed at the labeling and describing of things, which is why I feel dissatisfied (I couldn't express it that well before though).
Feeling a bit down about the semantic web thing at the moment is not the same as not enjoying Twine, I am getting value from Twine already, that is good even if I have to wait some time yet before I come across something and think "now that IS semantic".
by
James Choate
5 months ago
Richard Feynman was somebody I held in high esteem. He was on my list of 'to meet' and he unfortunately died before I could complete that goal. However, I would strongly suggest reading anything and everything you can lay your hands on by him. It will be worth your effort.
I agree with your feeling of dissatisfaction, as I told Nova shortly after joining Twine, I'm still having to do way too much of the work, and the machine is not actively presenting enough to me. Semantic Web should be an active interaction between man and machine, it's not that, yet. What I'm looking for is something that will extend my paper notebook I carry everywhere and jot notes and such in. I want to be able to bring that into an environment, and have that environment take what I've already done and make inferences on it and then actively pursue doing searching, sorting, and presentation of potential links that I can review and rate. The Semantic Web should be a research assisstent.
I agree with your feeling of dissatisfaction, as I told Nova shortly after joining Twine, I'm still having to do way too much of the work, and the machine is not actively presenting enough to me. Semantic Web should be an active interaction between man and machine, it's not that, yet. What I'm looking for is something that will extend my paper notebook I carry everywhere and jot notes and such in. I want to be able to bring that into an environment, and have that environment take what I've already done and make inferences on it and then actively pursue doing searching, sorting, and presentation of potential links that I can review and rate. The Semantic Web should be a research assisstent.
32
by
Daniel Feldman
5 months ago
Twain, if you refer to the Global Collaboration Environments twine I started months ago, you might find something better approximating your aspirations. There it was my goal to develop a space (a twine) where individuals could begin to specify what a future global collaboration environment might look like. http://www.twine.com/twine/1z07bvhd-3dc/global-collaboration-environments
Initially I wanted to take the example of the holodeck from Star Trek--The Next Generation. The holodek interface would enable individuals both living and dead to collaborate in a virtual face-to-face world on projects. Just as Reichert and Luc Pacard invited Einstein and other personalities into their holodeks to solve present problems or crises, we could aim for the same.
Initially I wanted to take the example of the holodeck from Star Trek--The Next Generation. The holodek interface would enable individuals both living and dead to collaborate in a virtual face-to-face world on projects. Just as Reichert and Luc Pacard invited Einstein and other personalities into their holodeks to solve present problems or crises, we could aim for the same.
30
by
Kurt Laitner
5 months ago
"Global Collaboration Systems twine"
I don't think it's a huge stretch for Twine to parse this and throw in a link to the twine for free.
I don't think it's a huge stretch for Twine to parse this and throw in a link to the twine for free.
32
by
Daniel Feldman
5 months ago
Kurt, I have manually included the link to the twine I refer to above. I invite anyone interested to continue this aspect of this conversation here.
by
glenn mcdonald
5 months ago
OK, I'm going to ask this question here because I can't figure out where else to ask it. Actually, that's the question itself! Where's the "there" in a twine? Daniel's Global Collaboration Environments twine sets out a three-step plan in its description: define, discuss, make a plan. But am I missing something obvious, or is the only forum for conversation the comment-threads attached to invididual bookmarks? Not that conversations around bookmarks are bad, but I assume nobody seriously believes that they are, in themselves, what we mean by "tie it all together". I can't even figure out how to add an "item" that isn't a bookmark, although the UI seems to hint that this is possible. And if this is a semantic web app, shouldn't there be some way for me to add some actual information, rather than just references to other (non-semantic) information?
by
James Choate
5 months ago
To add a non-bookmark item click add item and pick 'note' or 'document' or whatever. I'd suggest simply go down the list and get used to the format of each one (there is a cancel at the bottom so you can rollback with no commit).
As to the 'there' or 'here', there isn't one. There is no central concept or prefered frame of reference. My approach is I build my personal twine SSZ and then dump whatever interests me, or that might come in handy for future reference. It's a big generic 'in box'. At some point I hope the tools get to the point that I can add structure by adding twines under twines. When that happens what you'll see in SSZ is a pile of general input and then a set of category or topic twines the things get sorted into. From there, if the input/output aspect of twines improves I hope there will be agents with an agenda I can turn loose on them.
At this point I don't think the technology will allow you to tie it together because there still needs to be a level of indirection with respect to organising and 'processing' (ie join/merge/branch) twines.
As to the 'there' or 'here', there isn't one. There is no central concept or prefered frame of reference. My approach is I build my personal twine SSZ and then dump whatever interests me, or that might come in handy for future reference. It's a big generic 'in box'. At some point I hope the tools get to the point that I can add structure by adding twines under twines. When that happens what you'll see in SSZ is a pile of general input and then a set of category or topic twines the things get sorted into. From there, if the input/output aspect of twines improves I hope there will be agents with an agenda I can turn loose on them.
At this point I don't think the technology will allow you to tie it together because there still needs to be a level of indirection with respect to organising and 'processing' (ie join/merge/branch) twines.
by
Ryan Riley
5 months ago
It certainly seems that way. I've found it easiest to add bookmarks because of the Twine This+ bookmarklet. Notes seem prevalent, too, and I'm not sure really how much use a lot of the other types will prove. For instance, why place a review here versus in Amazon or another online book club. It would make more sense to be able to tie into that conversation through Twine rather than start it or re-create it here. Twine is only a semantic app in that it currently only helps you add prescribed Semantic Web data. Maybe the API will allow interaction with the outside world.
by
glenn mcdonald
5 months ago
Adding a "document" or "note" misses the point entirely. The thing we're supposed to be trying to do, all of us who are working on what has been called the Semantic Web, is bring machine-understandability to the granularity of data and specific relationships, not documents and generic links. "Smarter" generic "tagging" of documents is totally, painfully, embarrassingly not the change we're trying to bring about.
As for the no "there", this is an unrelated but equally clear failing. If you want groups to collaborate, by which I mean accomplish anything coherent and lasting, rather than just bantering and intermingling their link-streams, you need to give them some place in which to build things. Lists, databases, discussions that rise above the individual transient provocations, processes, roles. Entity-extraction doesn't change any of those human needs or problems.
As for the no "there", this is an unrelated but equally clear failing. If you want groups to collaborate, by which I mean accomplish anything coherent and lasting, rather than just bantering and intermingling their link-streams, you need to give them some place in which to build things. Lists, databases, discussions that rise above the individual transient provocations, processes, roles. Entity-extraction doesn't change any of those human needs or problems.
30
by
Kurt Laitner
5 months ago
ok, you've got my attention (for a very short while given the medium), is there a solution being proposed here?
by
James Choate
5 months ago
"Adding a "document" or "note" misses the point entirely..."Smarter" generic "tagging" of documents is totally, painfully, embarrassingly not the change we're trying to bring about."
I'm with you there brother!
"As for the no "there", this is an unrelated but equally clear failing...give them some place in which to build things"
I agree, I've made comments about ways to address different aspects of that issue in different places. I'm not sure it's worth repeating here at this time.
I'm with you there brother!
"As for the no "there", this is an unrelated but equally clear failing...give them some place in which to build things"
I agree, I've made comments about ways to address different aspects of that issue in different places. I'm not sure it's worth repeating here at this time.
by
glenn mcdonald
5 months ago
Solutions. So it seems to me that the main thing Twine should be focused on is allowing people to collaboratively define and build sets of semantically-structured data. A "twine" should be just that: a little data-model iteratively designed by the twine's members to represent some actual knowledge (not just link to it elsewhere), and then populated with bits of that knowledge. Bent's idea of "datapad for the masses" in comment 32, is probably a level too low ("encoding" is a good trigger-word to tell you you've slipped into technical implementation minutiae, below the level of the motivating human problem), but it gets at the right idea.
Take, for example, Twain's "Terminology" twine introduced in comment 19. A twine, in its current state, is going to do an awful job of addressing the very simple goal Twain states for this one: "be populated with obscuratum phrases which we have to find definitions for and inter-link". To actually do this, rather than allude to it, you (the group of people interested in terminology) need to be able to define a little schema for terminology. There's a "term", which links to one or more "definition", or probably more usefully a "definition in
Take, for example, Twain's "Terminology" twine introduced in comment 19. A twine, in its current state, is going to do an awful job of addressing the very simple goal Twain states for this one: "be populated with obscuratum phrases which we have to find definitions for and inter-link". To actually do this, rather than allude to it, you (the group of people interested in terminology) need to be able to define a little schema for terminology. There's a "term", which links to one or more "definition", or probably more usefully a "definition in
I'm sorry, but I don't buy it :-)
The name of the involved technology doesn't matter. Twine users don't write statements and graphs, they don't need to understand what powers Twine (or any other SW service), they just have to appreciate the end-user effects it has.
It's basically a confusion of who those names are for. And if the point is that "regular" developers have a problem with the names. Well...
And saying that -
Natural-language-processing and entity-extraction are interesting information-science problems, and somebody, somewhere, probably ought to be working on them. But those tools are going to pretty much suck for general-purpose uses for a really long time. So keep them out of our way while we try to actually improve the world in the meantime.
Is a bit ignorant of the companies like Powerset that have already deployed NLP for the masses. Those that push deep CS theory the furthest into practical applications are those that will ultimately succeed.
A black-box is still useful if its effects are desirable. You don't have to understand NLP to enter a sentence in Powerset or call an API function...
After reading the article, I can't help thinking that RDF really doesn't belong within (X)HTML. This is a pretty big deal for me, but I can't get away from it. From a truly semantic pov, (X)HTML describes a document. I realize that it has been exploded to do all sorts of amazing things, but it is, at it's roots, a document format. Why force even more into an almost back-broken format? (X)HTML's own tags can provide a load of semantic information about the document, and that's really all it should provide.
Microformats are a great way of recognizing that and staying fairly consistent to that nature, but they really aren't a very good solution. Creating RDF using n3 is much easier than XML, though both are fine, and are much easier to create independent of (X)HTML. Building a tool to create RDF data to store at a specific URI would allow so much more data to be placed out on the SemWeb much faster. Then, using the
relandrevattributes already in (X)HTML, you could extract triples to relate web pages (HTML) to web resources (RDF, etc.) and complete the graph without trying to force in so much more information than necessary into (X)HTML.Maybe I'm a bit naive, but that seems like a really workable solution. And if you must find a way to represent all RDF as a (X)HTML, XSLT and XQuery are great tools for doing just that, keeping in mind you are now creating a document to display data for the resource. Standard templates/queries should be able to go a long way--sort of a reverse GRDDL approach.
Or am I just missing something huge and obvious? (I tend to do that sometimes.)
=> if SGML had been properly understood and used since its beginning (late 60s ?), we wouldn't even have to talk about semantic web now... - it was precisely build to document, link and process data (without microformats)
But Microformats are the lazy mans compromise for the human-readable nature of SGML/XML.
Other infoset encodings exist - binary ones (EXI). Microformats should perish and intelligent microeditors/microviews or DSL views/editors should be made "on top of" the infoset model.
And how can markup be semantic just because it has links. Yes, it can link and you can name links and RDF can be expressed in SGML, but the infoset model does itself not lend itself that well to merging models and the kind of interoperability that can be expressed with the far simpler and much more general model of graphs in RDF.
Binary encoded infosets, domain-specific and non domain-specific designers. An pluggable (binary) "infoset notepad" is what we need.
And binary encoded graphs, perhaps in a binary infoset (EXI).
Embrace binary encodings so we can finally get away from this pain of human syntax checks.
Want to author an XHTML page by hand? Use an editor that does not allow you to violate the infoset syntax and one which also enforces any schema syntax layered on top. If it's binary the problem of human screw ups goes away because mechanical verification and unfolding is necessary to even comprehend the document.
"how can markup be semantic just because it has links" :
i consider "semantic" as "meaning", "sense" : a web / data model that 'makes sense' because it's structured like a neuronal network : links as qualified connections (A has_typed_relation_with B) between elements/objects (that are themselves defined by their properties = relations with other elements)
+ have some kind of 'memory' of links/tuples passed 'stimulations' (timestamp with origin and target of request) might be interesting ? ("living data")
I have a dream... a dream of taking my library (4,000 books or so, not sure 'cause I got better things to do than count 'em) and scanning it into a big file array as images (not text conversion). Then building a tool chain that initially does graphics analysis and text conversion on them. These files then get sent into an array of tools that looks for terms and sets of terms (ie tags) to allow relational structures to be built. Those structures then get coupled in with a set of notes and projects that I've also got in there. There is another layer of tools that looks for connections between the library and my ideas/projects. Another layer of tools interacts with me in weighting those connections so we can prioritize what's important. In parallel to this there should be a set of tools that scans google, wikipedia, mathworld, etc for information as well. Out of this should be another set of tools that allow me to take this and build models that I can then run on various data sets and look at the predictions. These predictions guide what notes I take, what books I buy, what web sites I want it to search next, what models I extend or build, etc. A big multi-layers IFS.
Tuples and linked lists are certainly one data structure, there are many more, and will be many more as we learn more. The critical point is to build it modular with good API's so it's easy to replace a module or plug a new functional block into the sequence.
RDF allows arbitrary structures to be merged because the model is so simple and universal. A graph data model subsumes all kinds of other data models. It's also just a labelled structure, but it allows arbitrary merging and has a flexible type universal system on top (RDFS and friends).
There are two was to work: express information directly in the most universal model or use a less universal model and have conventions for transforming that to a more general model (RDF/XML, screenscraping to RDF, Twine item recognition).
I prefer to start out with the simplest most direct encoding and to have tools do the necessary work to present it to humans and to let them edit it in a syntax guided manner. Humans should never even be able to violate the syntax rules of the encoding they write in or should be warned that if they do any particular thing that violates syntax, the data model changes from the current one to a lower-level supertype one. Of course they can still open a document in a different editor, but by default there should be this universal pluggable editor that enforces integrity, I believe.
Datapad, for the masses.
Acually you can express something in its 'native format' or some universal format. Your example confused format and process, you're really not talking about wo ways to work. How one works is not the same as how one displays or presents what one is working on. Don't confuse data and process/algorithm.
The simplest direct encoding is none. Strictly speaking any markup that won't take a straight text file and render it correctly is broken at a fundamental level. Markup is an -extension- to the native format, not a replacement for. This is a fundamental mistake the entire industry is making across the board with the current technology.
Markup should be used to extend a datum, not encode it.
RDF has a serialization syntax which is an XML dialect, but the XML datamodel (Infoset) and the RDF datamodel (triples) are fundamentally different.
I also argue that EXI is good or binary encodings in general. But that is not because EXI is closer to the Infoset datamodel, it is because EXI by-passes human intervention and forces one to use tools. Tools work consistently. They fail consistently as well - under the same circumstances.
Strictly speaking you are right, and wrong.
Yes, RDF in and of itself can be expressed in probably any language including Navaho. It won't be. Yes, the standard includes N3 (whatever that is, never had a reason to dig any deeper) and by implication it could be others.
However, if you reflect on it for any length of time you realize that XML is for describing the structure a resource. RDF is for describing the meta structure of a resource. If you step back you realize that they are really opposite sides of the same coin.
Now let's look at the W3 RDF Primer...
http://www.w3.org/TR/rdf-primer/
Abstract
The Resource Description Framework (RDF) is a language for representing information about resources in the World Wide Web. This Primer is designed to provide the reader with the basic knowledge required to effectively use RDF. It introduces the basic concepts of RDF and describes its XML syntax. It describes how to define RDF vocabularies using the RDF Vocabulary Description Language, and gives an overview of some deployed RDF applications. It also describes the content and purpose of other RDF specification documents.
Now, note it says 'XML syntax'. My prediction is that within 5 years there will be no distinction. They will be rolled into a single standard for machine to machine exchange.
A developer could certainly follow the non-XML approach, my suspicion is that you'll be out standing in your field, all alone.
Is RDF/XML the end? No, clearly it is not. My suspicion is that the next major step forward will come from the harnessed power of all those machines talking to each other. Some bright boy down the road will come up with a 'data representation efficiency' metric (ala Shannon's Entropy) and use it on a genetic algorithm as the fitness decision mechanism. Wouldn't necessarily surprise me if it actually gets done by accident as a side-effect of some other similar goal.
Now, to cover the last aspect of this discussion and that is the serialization 'defect'. It's a religious argument of concern to people only. My suspicion is that when the machines get it, whether you resolve the nomenclature or not, the actual data management processes and algorithms will be the same. It's a specious point.
- I don't belive RDF/XML is the future
- I do believe that short-term RDF "compatible" formats are promising (classes/properties distinction relfected in nesting structure)
But whichever concrete syntax is chosen by the masses is not that important to me. In the end the path of least resistance (best performance, easiest processing, least memory usage, most robust, most extensible) will be chosen,
There is nothing more extensible than RDF. Not even XML. XML just happens to be a concrete syntax which, because of its extensibility is a fun expression of RDF, which itself is insanely extensible (anything about anything, open world assumption - the pen is red and blue).
As for structure vs metastructure - I've long given up on that distinction.
The thing is that RDF was initially used for things like Dublin Core, describing very high-level more or less vague concepts like title, subject, etc. But to me there is no distinction between meta and data, because all data is meta and all meta is data.
It's back to the discussion of a thing being its own truth but everything about some other thing is not the truth about that other thing. There are just different levels of granularity.
For example
- thing (a flower)
- discrete representation (pixelisation)
- continuous representation (vector graphics)
- concept ("flower")
Going to have to sleep soon now...
It's semantic in that HTML defines links as part of its structure. It may mean nothing other than "this resource relates to this other resource," which is certainly ambiguous and mostly unhelpful but not meaningless. Also, RDF can certainly be embedded within HTML; I have no problem with that. My point was more that--and this relates to your point about binary encodings--we shouldn't have to embed all this meaning within SGML/HTML/XML that's also used for display; rather, use a transform to display the human-readable representation, of which HTML, PDF, ODF, etc. are existing formats. Perhaps new formats are needed, but those currently exist and will likely continue to be used for the foreseeable future.
We should start trying to move the data to a different format and stop worrying over how to add all this meaningful data to the existing web. Freebase is an excellent example of pulling data from Wikipedia into a re-usable store. Why spend so much time and effort trying to re-create the existing web technologies when it's so much easier to just build a better infrastructure underneath upon which newer formats and tools can arise. Think of early "horseless carriages" on dirt roads. Once new roads were build (new data store infrastructure), new cars could take advantage of the benefits of the new roads (visualization technologies of which HTML is only one). This analogy seems a little weak; sorry.
Anyway, the more we move away from HTML as the lingua franca of the web, the more likely we'll have a better, more scalable web for future apps and tools.
There is too much focus on having formats that are human readable - it confuses representation and presentation.
Presentation should be a function of representation. Representation should not be presentation friendly per se.
Instead we need to have a natural efficient unniversal encoding for knowledge bases and push everything into that format. No microformats, no compromises. Microformats are not extensible. They are there to please humans (in one way or the other).
Linked graphs are the scalable alternative. Microformats are the way of the past. I don't particularly care how the linked graphs are coded, but they will, by their natural form, be extensible, mergable and lend themselves well to partial understanding. If they happen to be coded in a binary form, then they're relatively impervious to human screwups - sans by proxy (coders).