Data lakes

A house by the lake: leveraging game-changing data collection for clinical trials

Data scientists and IT engineers create Stratus to centralise data and aid in the speed and effectiveness of clinical trials.

Whatever industry you're in, data is becoming more and more crucial. And in life sciences, you’ve got data from trials, data from patients, data from clinics. And these billions or trillions of data points pose technical as well as human challenges, because collecting and sifting through this amount of data is difficult – even for a large company.

One significant challenge is navigating the plethora of systems in use. Imagine for a second that data (both raw data and metadata) is distributed across many locations with different levels of accessibility. This data is the lifeblood of clinical trials. But imagine as well that the current approach results in a loss of efficiency, less accurate decision-making, and difficulty in conducting explorative data studies – stifling innovative ideas. So, a group of skilled data engineers, quality assurance professionals, and developers familiar with this problem sets about to try to centralise their data.

And to do this, they begin by building a data lake house.

Fancy a swim in a data lake?

A lake house, which sits next to a data lake. This is a metaphor used by Henrik Lynge, Business Development Partner in Digital, Data & IT, Novo Nordisk to simplify what Stratus is.

A data lake is a centralised repository that allows you to store all your structured and unstructured data – regardless of scale – effectively creating a vast pool of raw data.
The lake house then represents a data management approach that combines the capabilities of a data lake and a warehouse. It enables business intelligence and machine learning by consolidating data that powers fast, scalable analytics. The lake house basically serves as a refined source for clean and organised data, improving the efficiency of analytics and ensuring accurate data-driven business decisions which is crucial for innovation and advancements in the clinical space.

Why?

Because we want to give everyone clean water. Well, sort of.

Water, water everywhere and not a drop to drink

Continuing with the lake and water metaphor, imagine that water gets transported by individual trucks from the lake to a city for purification. To do this, you might use trucks. While this method is flexible, it might not be efficient at scale – imagine if you only have two trucks to supply a city. It's costly and can cause delays. That’s the usual state and challenge of data infrastructure.

And for Novo Nordisk and the plethora of data collecting systems and their distribution, the challenge was similar. The current approach at the company was akin to using individual trucks for water delivery – scattered, inefficient, and costly. And with data being vital for efficient processes, patient recruitment, budget scaling, and crucial decision-making, there was room for considerable improvement.

The smooth and uncomplicated transfer of good data from clinical systems to the rest of the organisation is pivotal. It's the key to innovating smarter solutions, propelling the execution of clinical trials on a global scale, and at speed.

“Instead of everyone purifying their own water, you bring all the water to a central place where it's purified. Then pipes are used to distribute the purified water to houses, saving individuals from the need to purify their own water or desalinate it. The goal is to have a more efficient and centralised system for water purification and supply.”
Henrik Lynge, Business Development Partner in Development in Novo Nordisk.

Bridging the data divide

So, with the combined efforts of experienced data scientists and software engineers across different areas in Novo Nordisk, an innovative initiative called Project Stratus was founded.

Using the water metaphors, Stratus is a centralised pipe system that acts as a conduit and provides the data scientists with a unified entry point into the labyrinth of trial data. It is a modernisation of the water infrastructure – something that took the trucks out altogether and replaced them with faster, more reliable and more cost-effective methods. It acts as a central purifier and organiser that provides clean and consistent data. Improving efficiency, reducing costs, and guaranteeing timely access to vital resources.

Technically, it starts with Stratus

Metaphors aside, what made Stratus possible? It was mostly created with cloud-based services such as AWS services. Specifically, Lambda for serverless computing, and orchestration layers written in Python. A Databricks lake house architecture was also built, using PostgreSQL for database management. Data engineers, QA specialists, cloud architects and software developers relied on CI/CD and DevOps processes to consolidate data silos and build reusable data "products" with the goal of rapidly creating solutions that provide unified definitions and business insights. The team is also employing open-source tools like Open Study Builder to share metadata.

And what happens with Stratus now that the pipes are laid? Stratus combines a state-of-the-art data platform with a highly skilled team of data engineers who design, build, and maintain the pipes on behalf of the city. Eventually, this city will be efficient enough on its own and be able to run through a standardized and unified data platform.

Curious about our tech stack? Explore it here.

How do we drink it?

Stratus doesn't just stop with the collection and process of data. The project is actively crafting software integrations, manifesting as robust APIs on top of Novo Nordisk’s internal data lake infrastructure. This approach opens the gates to construct intelligent web and mobile applications. These applications could be designed to seamlessly interact with clinical trial data, speeding up research and potentially getting drugs to market even faster.

“It’s vital that people who want to work in life sciences understand the important work they're doing. Anybody can make a buck in the banking sector. So, if we don’t make it clear that what we are doing is benefitting millions of people with new treatments every day – what’s the point?”
Henrik Lynge, Business Development Partner in Development at Novo Nordisk.

Data is power

The power of data and how it can be used for to improve medical research should not be underestimated. As Stratus continues to evolve, the potential to set new standards through innovation grows every day. And there is room in the team for even more bright minds.

And sometimes, helping people means spending time at a lake house.

Find a job