When the phrase “The Semantic Web” was coined, it was hailed by some as a resplendent vision, by others as an unachievable goal and by others as a new bubble. The point is that the current Web is indexed almost entirely syntactically. So, there is no simple way to differentiate between people who make bread (Baker) and the titular descendants of someone who might have baked bread hundreds of years ago (Baker) in a purely syntactic fashion.

Even within database-oriented proprietary systems, the injudicious digitisation or storage of digital information can lead to significant difficulties. While doing the research for my LLM thesis on copyright, I found that it was quite difficult to search the standard legal databases for “copyright”, since almost all written material included a copyright notice which triggered the search mechanism. I’m now finding a similar problem when trying to find papers on Google Scholar, and proprietary academic databases, regarding privacy. Many of the database engines include privacy policy links on their pages and the Google Scholar indexing system cannot distinguish between headers and links to privacy policies and papers about privacy issues.