1. How would you describe High-Browse in your own words?
High-Browse is a legal research platform that brings three aspects of EU law under one umbrella, finds connections between them, and displays those connections in the simplest way possible.
Currently, we have managed to bring in jurisprudence, legislation and treaties of EU law under one umbrella. As a result, we have found connections/citations between jurisprudence, legislation and treaties in a way that illuminates much light on the individual documents in a way that was not there before. Thus, for example, we can now rank different judgements based not only on how many times they were cited by other jurisprudence, legislation and sometimes even by treaties, but also, we can weigh that citation by their importance finding out who is citing them.
Thus, a lawyer preparing for a case or researching/studying jurisprudence can find out in High-Browse not only just documents but documents enriched by information generated by analyzing a large spectrum of EU law data.
2. Which problem were you trying to solve when developing High-Browse?
When entering the EU legal world, two tech-related problems can be challenging to navigate.
First is the lack of a good search engine. The second is the lack of a simple system that shows all interconnected information related to a particular document under one view. In High-Browse, we are trying to solve both of them.
EU law is a considerable space of information and, in terms of data, mostly unstructured. Through EUR-Lex, the EU has been doing a great job in giving some structure to the data. But structuring that amount of data is such a tremendous job that the design and quality of data not only varies between courts like CJEU and General Court, in the case of jurisprudence but also between different periods. Thus, for example, the data structure completely varies between now, which is getting progressively better and say before the early 2000s.
Another constraint regarding data is that all of it is not in one please for example, EUR-Lex lacks General court data which is only available in Curia.
As a result, finding exactly what a researcher wants is not very straightforward. Moreover, it is often time-consuming and labour intensive. Therefore, most of the time that a researcher should spend analysing data is spent on finding and structuring it.
At High-Browse, we are trying to solve the first problem of finding relevant data with an innovative search engine. We call it Themis. It provides search results with the help of a deep learning algorithm specially trained on EU jurisprudence and legislation. The relevant search results are then ranked to provide the best probable outcomes on the top with the help of a citation-based model. The first results are incredibly accurate, and the more it learns the ways of EU law, the better it gets in predicting what a researcher wants when they mention a word.
We solve the second problem of lack of related information under one umbrella by High-Cite. High-Cite is a web page where users can view a case law or a legislation document along with all its citations and relations.
Thus, for example, for a Judgement, a user can view all other Judgements, Opinions from Jurisprudence that have cited it. A user can also view all the Legislations and Treaties that have cited it. They can also view the Jurisprudence that has been cited in that document. And if the case law has cited a Legislation or a Treaty, users can view the individual articles of the legislation or treaties that the document has cited. In this way, users don’t have to look for citations but read them in one place and at the document level and at the particular article level, thus saving them much time first finding a specific document and then trying to find the relevant article in those documents.
3. What was the biggest challenge on your journey so far
The first challenge is that it was tough to get meaningful information out of documents programmatically, be it jurisprudence, legislation, or treaties. When I say meaningful information out of documents, I am mainly talking about writing a program that will read the paper so that we can structure and link the information found to other documents and create High-Cite.
Let’s take an example of a judgment with citations to treaties and citations to legislation on which it is based. Generally, the citation to the Treaties and Legislations are to the article level. So, we wanted to get legislation and particular articles of that legislation mentioned in the document. Now the challenge is that because this entire thing is not very structured, and judges write the judgment, either from different Chambers of Justice or General Court, the structure of words mentioned is never the same. Moreover, they are even more diverse as we go back in time. So, in the late 2000s, it has been written differently than it is written now. This made retrieving information from those judgements very challenging.
The second challenge that we faced is designing a search engine to get information relevant to a searched word or phrase. Search engines are there for a while now, and they use a million different ways of tracking and predicting what a user wants to get out when they type a word or a phrase. These are based on a deep learning algorithm that first studies user behaviour and then predicts them.
The biggest problem in EU law is how you train an algorithm to understand what a user wants when typing a particular word or phrase. And then, and then when somehow you get the search results of case-laws and legislations, how do you rank those results so that a user receives the most information case-law or legislation on the top. If we compare it with Google, then all web pages are ranked by how users use it or what content those webpages are publishing. But when it comes to EU law, no such way can be used to rank the search results, which means that we had to develop a model or base our model on available research on this subject.
When we did our research, we found out that there is quite a bit of research material on ranking but not a lot on how to create and train an algorithm that will do predictive searches. There are research papers out there, which gives suggestions and refutes particular possibilities. Still, you don’t get a direct analysis of how to find a specific word through an algorithm or how you can teach an algorithm law and rank its search results.
Therefore without proper directions, the only way out is trial and error. Frankly, we still doing trial and error as it takes time to understand patterns and also, every day, more and more documents are coming out with texts different from the previous ones. But there is light at the end of the tunnel, and the results we are getting is excellent and getting better every passing week.
4.How do you see High-Browse changing the legal tech landscape in the coming years
Firstly, by providing the most relevant information quickly and comprehensively, we would like to free up research time for legal researchers, lawyers or academic researchers. As a result, they will have more time to focus on the analysis of law instead of investing valuable time in searching for it. This, I believe, will contribute to improving legal research both quantitatively and qualitatively.
Secondly, by collecting and connecting legal information from different data sources, we will make a vast amount of enriched data to researchers. Thus, by enhanced data, I mean data that we collect from various documents like citations to other documents. We can make, for example, data available to researchers which can empirically explain how in a specific situation, Southern European courts cite other European Case-laws and other European legislation in comparison to the same condition in Northern Europe. This will open up new avenues of research in EU law.
5. What makes High-Browse unique?
Firstly our search engine, which we call Themis. It is a deep learning algorithm that has been specifically trained to make word predictions in the EU law context. What does that mean? Thus, for example, if anyone types the keyword ‘Cost’ in Google search, Google will return with search results relating to cost as a financial term. This is because its algorithm is trained in more general words. It will then rank those words predominantly based on user behaviour.
However, if someone types the keyword ‘Cost’ in High-Browse they will get case-law search results that will find cases with Parties with names like Flaminio Costa, for example. Then these search results are then ranked based on how often and how long they are cited, thus implying their importance in jurisprudence. To get search results most relevant to the user’s data needs, all results are classified under their respective categories like ‘Agriculture and ‘Fisheries’, ‘Competition’, ‘Immigration’ etc. A user can even choose the categories they are interested in, and as a result, they will only get results from the categories a user is only interested in.
The second unique feature in High-Browse is our Citator module called High-Cite. It took us a pretty long time to research and figure out the method to retrieve information from unstructured texts of case laws, legal acts and Treaties and then connect them. It is done so that in High-Cite, a user can find all the documents citing case law. For instance, they can also find all the citations that the viewed case law cites to other documents. Moreover, we have connected the legal acts or the treaties with case laws and one another. And all of them are related to their articles level. Therefore, while reading case law, a user can find the article of the legislation cited in the case law and can read the article from the legislation on the same page.
Our citation module High-Cite is so thorough that we can even value the importance of a citation by what kind of document is citing them and the situation that required the document to be cited.
Where do you see the company in three years?
First, we would like to build upon our user feedback. We have made all these new features, and we would like our users to use them to the max and give us feedback about what they want us to include and improve. Second, best on that feedback, we would like to roll out many more functions.
Secondly, we would like to include country laws in High-Browse and connect them with the European laws and, in this way, give an even more comprehensive view on the legal connections to our users.