New posts are published on Pulse from LinkedIn
The idea of this contribution is the result of a reflection on:
Projects in artificial
- Laws regarding the protection of personal data,
- Terms of web services,
- Laws governing collective behavior within a globalized society along with the notion of "rights and duties" that defines the expression space for individual freedom and its limits face to respect the general interest.
The objective of this paper is to contribute to the debate on the use of more and more complex algorithms for a bigger and bigger number of users by an increasing number of powerful trading companies.
The person who has probably the most reflected on the impact of artificial intelligence in a society or civilization is certainly Isaac Asimov's three laws with robotics.
So far the application of these laws remained the domain of science fiction as available technologies, trade issues, implications for most people remained more or less under control.
The situation changes, or will inevitably change by 2019. Inert and living technologies intersect. The strength of affordable computing power (17.59 petaflops) combined with the miniaturization of electronic components, the use of more systematic intelligent agents, the ability to answer questions in natural language (IBM, Nuance, Siri, ...) ,"cloud computing", the imperative need to protect personal data (Instagram, Facebook, Google, ..) show that the implementation of "universal laws" protecting individual and humanity becomes imperative.
At the dawn of 2013, "entities using more artificial intelligence" facing human beings are primarily Social Networks.
Any social network in the world actually uses more and more complex systems, artificial agents and now artificial intelligence algorithms approaching closely human behavior with the possible bias to want to free itself from control of the largest number
human being, wants his side to better control the information it publishes on the Social Network and keeps “Ownership”/”Property”
The Social Network, he, is a kind of autonomous entity to provide services to human beings comparable to humanity (hundreds of millions to billions of users). The Social Network has rules and laws among which one is to maximize the profitability of its services, another to be in compliance with local laws or broader ones...
We can thus compare a Social Network to a robot and each user to a human being belonging to humanity. So, let us try to apply three Isaac Asimov's laws to the Social Network:
• Zero Law: A social network may not harm humanity, or, through inaction, allow humanity to be injured.
• First Law: A social network may not injure a human being or, through inaction, allow a human being to come to harm, unless inconsistent with the Zero Law.
• Second Law: A social network must obey orders given to it by human beings except where such orders would conflict with the First Law or the Zero Law.
• Third Law: A social network must protect its own existence as long as such protection does not conflict with the First or Second Law or the Zero Law.
Remain now to define how to implement and operate concretely these laws …
CCTV generates data streams increasingly important. Most of the time data is analyzed retrospectively, and requires a lot of staff. Rare, or very expensive, are systems which are complete and reliable for working in real time.
The reasons are manifold. The most visible is on the large volume of data to be processed. Then the concept of artificial intelligence or learning machine complements or even replaces human intelligence. It is indeed necessary for visual recognition algorithms to generate textual information with quality. Finally, existing solutions are cumbersome to configure and expensive to implement. Disappointments about the reliability of these systems since 2005 have been many.
For example, let’s briefly study the multiplication of on-site raw materials thefts, especially metals, and construction equipment. A site is "open place" that is to say, direct interaction with the natural elements (rain, wind, wild animals, pets ...). The establishment of a basic solution based presence detectors deters intrusion, detects fire... It can rarely confirm these alarms, analyze a situation and make a historical event. The use of tracers does not prevent theft but increases the risk of the thief. Etc...
A CCTV system, basic, meanwhile, consists of unidirectional cameras, dome cameras, analog or digital, infrared or in the visible spectrum, is connected to a monitoring center. The cameras are scrutinized by one or more operators who watch a lot of those cameras. Fatigue or weariness, too many cameras to monitor, the fear of false positives and sometimes the lack of training make that an intrusion or theft is handled with a too long delay sufficient to avoid the financial result such as a lengthy analysis manual video streams to understand the intrusion and try to find the culprits.
A video surveillance system allows more advanced, often in addition to a basic solution, to configure and manage the modules of automatic analysis and detection of events based cameras. This generates a large number of notifications, usually by e-mail or XML messages, with an ARC (Alarm Receiving Center). The passage of an animal such as a cat or a rat triggers alarms. A crowdy place also pollutes most of these automatic systems. Discern in this mass of information the relevant thresholds remains a challenge.
Three main elements stand out:
- Integrate artificial intelligence to cameras is required;
- Being able to handle large volumes of information at a low cost is necessary;
- Generate contextualized information from cameras is useful.
We worked with a company specialized in image recognition and video which collects a large number of operational case studies. Their recognition algorithms are powerful. They have, among others, equipped with their CCTV software the G20 summit in Cannes in 2011.
Do not hesitate to contact us.
As a European lawyer, you need to handle every day the 23 languages of the European Union (*). Every morning, you can’t read all relevant information for your business that is published in all these languages. Hence, you surely focus on two or three of them with the hope not to lose any relevant information. Why wouldn't you be able to handle these languages as a single context-based environment?
As a lawyer, each word is especially important. How many times weren’t you able to find back a specific term? How many times weren’t you able to get the relevant information with a simple keyword or a set of keywords? Wouldn’t you have preferred to get this relevant information thanks to a context?
As a lawyer, you need to avoid using ambiguous terms that would create an ambiguous context or vice-versa. How many times have you been obliged to go through a long list of documents just because the keywords you were using were generating ambiguity? Why wouldn’t you have a knowledge system that automatically disambiguates documents?
As a lawyer, how many times haven’t you dream about a system that would automatically generate relevant “Similar Articles” just thanks to a simple click from within your current document?
Well, stop dreaming! This is possible today thanks to mARCTM technology!
Test here the "Similar Article" feature (SA button in front of each result line) thanks to this demonstration environment. Don't forget to select the language you want to test at the upper right corner of your screen.
(*) Bulgarian, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Greek, Hungarian, Irish, Italian, Latvian, Lithuanian, Maltese, Polish, Portuguese, Romanian, Slovak, Slovene, Spanish, Swedish
In a previous article we presented the Definition of memory Association by Reinforcements of Contexts.
In this article, we provide you with some feedback from an experimental demonstration which compares mARCTM to a “Best Class Search” engine. We don’t publish entirely our study. You should nevertheless find enough information to come back to us if you find this experimental demonstration relevant.
In order to demonstrate mARC’s benefits, especially points 1,2,4 and 5 described in our previous article, we built an experimental platform which compares a traditional efficient procedural search engine, in this case «Best Class Search», with a basic search system, functional clone of «Best Class Search», using a mARCTM memory.
Compare plausibly two systems functionally equivalent in order to check the advantages to implement mARCTM.
Data: The selected corpus consists of Wikipedia articles in French (Fr) and in English (En).
base Fr for mARC : 1.0 million articles base En for mARC : 3.5 million
base Fr for «Best Class»: 1.4 million articles base En for «Best Class» : 3.9 million
* The difference between the number of articles in both cases is due to the fact that Wikipedia, and «Best Class Search», as up-to-date web search engine, evolve, whereas data we access to are those of Wikipedia from approximately February 2010.
1) «Best Class Search» is restricted to fr.wikipedia.org and en.wikipedia.org domains
2) A procedural approach keyword search engine simulation, implementing a mARCTM software pre-commercial version.
The name of the search engine and indexation is Syncytiotrophoblaste. It should be used as a programmatic basic sample using mARCTM memory within the first commercial documentation of mARCTM.
The user interface (UI) mimics «Best Class Search» Look & Feel, including advanced features such as assistance to the input query, dynamic predictive requests while typing.
The UI provides additional key features which can easily be implemented thanks to mARCTM like:
- Search by contextual similarity of articles (SA),
- Meta-search images engine based on the query,
- Dynamic Query helper by associations and shapes suggestions.
Syncytiotrophoblaste is not a contextual search engine by itself; as such a design wouldn’t have to make a point to point comparison with a reference of the market.
The constraint of simulating a search mode type « keyword » involves the use of mARCTM in low level mode, except the search by contextual similarity of articles (SA).
The indexation algorithm used is also keyword oriented, and not purely contextual, which overloads the internal database of the prototype.
In other words, the technical design of the application, very basic, takes only partial benefit of the underlying features of mARCTM.
On the one hand a distributed «Best Class Search» architecture and on the other hand an Intel core I5 hosted by OVH.
The hosted OVH server contains two mARCs, one for the French indexation, the other one for the English indexation, and an Apache Web server. The operating system (OS) is a virtualized Windows 7 within a VMWare partition.
4) Validity of the comparative
It is not obvious, at first sight, that the two platforms are comparable. However a somewhat finer analysis of the «Best Class Search» distributed architecture indicates, as a first approximation, that the comparison makes sense. After discussion with external engineers, they are of the opinion that the test conditions are significant; even that mARC’s platform is slightly disadvantaged.
Results: Data Independence
Indexation and requests are handled exactly in the same way on the French and English corpus by the application of (re)search.
We are able to demonstrate the same behavior on the German, Spanish, Italian, Alsatian and Breton Wikipedia corpus.
version of mARCTM used is based on two simplifying assumptions:
Hence, the current version of mARC does not allow validating universal data independence. Nevertheless, it proofs ad minima, that it is independent from the stored language.
Therefore we “only” have a partial proof of data independence.
Here are the different sizes of elements used by the simulated search engine: mARCTM memory content, indexation information, reverse indexation of information.
We notice that:
1) mARC’s size does not vary linearly as a function of stored data, but, at worst, as logarithm (Log) of number of stored data.
2) The index, that is, all the information necessary to the search engine, weighs about 50% of the initial data size.
Today, Full Text index included on most SQL servers on the market or search engines index like Indri, Sphinx and others, “costs” between 100% and 300 % of the initial data size.
We don’t know «Best Class Search»’s index/data ratio.
Conclusion, mARCTM itself is compact as specified at the beginning of this study.
mARCTM based indexation applications are much more compact than similar ones based on linear memory (RAM). The gain in terms of index index/data between mARCTM indexation and classic indexation is between 2 and 6 for the mARCTM implemented application.
A less “keyword search” and more contextual search indexation strategy (like Similar Article) would easily shrink the footprint of the static information of indexing by an order of magnitude (x10), using mARC’sTM dynamic resolution of relationships at runtime.
This application does not allow a
direct measure of mARCTM’s speed. Nevertheless, as it is based on
its usage, the gain in
speed compared to a technology at the edge of linear memory, can only be attributed to the
use of mARCTM.
We used a list of 100 most popular search requests in 2011 and 2010 on Wikipedia in French and English.
A second part of the test is based on the use of article titles as a new query, and copy/paste a portion of the text of an article in order to follow the trend of request size observed in recent years.
Each query is made four times. The first one is to
measure nocache response time, the other three in order to evaluate a mean of
the cached query response time.
The real recall rate was also measured.
However, it is unnecessary to repeat the search with the omitted results included, the recall rate is generally not varying, or at most a few units.
In the case of Syncytiotrophoblaste, the indicated recall rate is always the real one, and all documents can be accessed.
We made measurements in 4 cases: Fr, En, Popular Requests, Long Requests.
Average query times have been extrapolated from the Pareto rule that applies in general to the cache / nocache logic, namely: First Query (nocache) * 20% + average three next queries (cache) *80%.
For tests of the most popular queries, it appears that «Best Class Search» average response time restricted to wikipedia.fr and wikipedia.en domains, are respectively of 119 and 132 ms.
The same queries, extended to the whole Web (not published here in order to avoid overload), indicate an average response time of about 320 ms, of the same order of magnitude as the «Best Class Search» claim of 250 ms.
These results demonstrate the consistency of our assumptions; «Best Class Search» focuses and optimizes research areas, like Wikipedia, and furthermore, each server cluster which participates in the resolution of a query is very little stressed, as indicated by the stability of «Best Class Search» response time. This comparison is plausible, in our view.
Conclusion about Speed:
Reading the results leads to the following conclusions:
- The search application that implements a software mARCTM is at least one order of magnitude faster (factor 10),
- mARCTM implicitely increases data caching by a factor 2, compared to the most sophisticated solutions currently deployed. We believe that caching mechanisms based on contextual predictions could be able to achieve the improvement factor of one order of magnitude in that field (factor 10).
allows to our search application dynamic similarity from all contexts of a document, within an average response time of about 52 ms, which
is 3 to 5 times quicker than a simple keyword query from a traditional search
engine and with a result of unquestionable relevance.
syncytiothrophoblaste, once the first result page is accessed, all results are
cached. As a result, average response time per page with 20 results per page is
about 5 ms. With «Best Class Search», loading another page is equivalent to a non-cached
query which requires each time 70 to 300 ms.
Knowing that the average route requests generates about 2.5 pages, one can easily interpolate the average response time for a search engine optimized with mARCTM, to less than 5 ms, which corresponds to a ration of more than 25 compared to a procedural search engine like «Best Class Search».
An optimized integration, based on a commercial version of mARCTM, and coupled with an industrialized search engine development would expect not just one order of magnitude (x10) but something closer to 2 orders of magnitude (x100).
Results : Easy Programming
You will find below the php code used within Syncytiotrophoblaste application to get the Similar Article query.
4 elementary access to mARCTM’s API, initialization code and results rendering.
The complexity of the detection and selection of context is totally transparently outsourced to mARCTM within few milliseconds.
public function connexearticles ($rowid)
// echo " similar article ";
$this->s->Execute ($this->session, 'CONTEXTS.CLEAR');
Technically, the different characteristics of memory
mARCTM are here indirectly demonstrated.
In other words, in the case of a text type signal, are these contexts directly understandable by a human being?
Another way of looking at it is whether a search engine using a mARCTM brings some relevance to the intuitive sense of the term, compared to a conventional system?
Discussion about evaluation of relevancy
Relevance, if it is subjective (not modelled), is very real. It is simply not possible to make a physical measure of relevance. The sole valid method would to provide this comparison platform to a large enough set of users from Wikipedia searching in parallel on «Best Class Search» and Syncytiotrophoblaste, and then evaluating their respective satisfaction.
Please contact us to discuss further on with us if you are interested as we truncated the second part of our study.
In terms of relevance, automatic selection of the most interesting articles, and exploration of documents in a database, it is clear that the application based on mARCTM offers better results than all present oriented procedural keywords engines.
All rights reserved.
W4 works with mARC.
mARC is set of disruptive technologies helping you to build automatically "contexts".
You can find mARC's 2 mn presentation here or view it directly below :
W4 project was initiated with the report that a lot of companies and users were willing to put on their markets a new race of applications able to handle multiple sources of information or large amount of data at lower costs.
It provides consulting services and promote technology based on a set of breakthroughs in the way digital information can be found or analyzed quickly and with maximized relevancy.
Working with W4 allows you to have access to unique methods, accelerate cloud computing, get mobile devices new usage, implement new computer chip designs and remove some social networking pains. Technology promoted by W4 is “unique” and respects patent, author and commercial rights.
Technology is based, but not only, on an Artificial Intelligence (AI) engine whose paradigm is far different from standard statistical and computational approaches. The core concept is to mimic how a human brain handles signals for automatic recognition of text. A useful set of applications are made accessible to most developers around “Big Data” (indexation, routing, user profiling…), major improvement of existing search systems, disruptive discovery associated to new business models which “act on data”, intelligent browsing.
Technology enhances learning of all sort of text based information as a set of prerequisites or monitoring efforts are removed. The neuronal engine learns ex nihilo from the sole texts which are submitted to it without any others resources.
The engine is disruptive as it is beyond theories like « multilayer perceptrons », Hopfield model, Kohonen self-organized maps, Bayesian statistics, Markov chains and conditional random fields. It directly simulates the brain area of Wernicke, a small region where recognition, association and similarities occur. The engine is build, but not only, on an associative memory, entirely autonomous, which growth and internal topology depends on the incoming signal. This topology also depends from the learned experience which behaves like an internal stimulus which in return modifies the structure and topology of the associative memory network. Hence, there is no need to create a statistical or mathematical model to process the data. We named this automatic creation of knowledge in the memory, based on the internal coherency and correlation of the signal, mARC (Memory Association by Reinforcements of Contexts).
The engine learns language structures with semantic links, either explicit or implicit, through analyzed documents thanks to its intrinsic association capabilities. The direct consequence of this approach is a maximized relevancy to users’ queries. This allows powerful contextual applications.
In order to show the unique value of promoted technology, W4 makes available a set of tools.
These tools show how the technology automatically and incrementally creates, without any external resource (like ontologies, stop lists, specific dictionaries, RDF or OWL vocabulary, manual tagging of large heterogeneous corpus…), semantic associations which allow to categorize corpus per context beyond any keyword approach. The engine can auto-correct and auto-evaluate itself. It is able to extract a context from a semantic unit. In other words it allows polysemy (different usages) and synonymy.
Whenever you search across a web application, either a Google/Bing or an internal company one, you expect from a user perspective to get :
- at least one response,
- a quick response, subjectively below one second,
- a powerfull response, in other words « relevant ».
You might have forgotten an energy cost associated to that search. To simplify, computers need electric power to deliver the requested query. The bigger volume needs to be analyzed, the better relevancy is expected, the higher the costs. Here are two indicators :
- in average, energy required to provide a query response is equivalent to one hours from an energy saving bulb,
- just considering a company like Google, their annual energy consumption is estimated between 1900 MW (accounting) to 2500 MW (including their own production), which is more than a nuclear plant.
Important changes in progress
Why are you interested?
We have just spoken about « search» which is just considering sending a query and getting results. More and more query suggestions are provided « on the fly » in web enabled applications to “act on data » (discovery). Keep you regularly informed.
Our objective is to deliver a relevant response at a competitive cost within a second. Our value proposition includes either consulting or delivery of secure innovative and robust solutions.
We will present you here on this web site methods and technologies illustrating our expertise and business. Among other, we will publish some posts on :
- green software development,
- Big Data indexation savings,
- an «associative memory system with enhanced context» …
Vous trouverez régulièrement sur www.webpuissance4.com des publications en langue française.