Issue #18: Architectures for LLM Applications and Warehouse Native Apps
Plus Business Rules Over Diagrams, High Performance Iceberg Lakehouse, Lesser Known dbt Features and Dataform vs dbt
As mentioned last week, I’ve been very busy this week, so just article links and a day early. They are very good articles though!
Next week will be Databricks and Snowflake summits and hopefully a lot of writing for my guide.
Emerging Architectures for LLM Applications
Andreessen Horowitz, a venture capital firm that invests in many tech start-ups, has written about how you might integrate Large Learning Models like GPT into your own applications after interviewing many experts in the field:
While you may not need all components in the architecture above, it does show it’s not straight forward to just add GPT to your website.
If you’re interested in adding LLMs to an existing application, I’d also recommend reading Honeycomb’s article “All the Hard Stuff Nobody Talks About When Building Products with LLMs“.
High Performance Data Warehousing: Business Rules Over Graphic Diagrams
I like reading controversial posts that make me think differently about my profession and this post by Robert Harmon, Solution Architect at Firebolt (Cloud Data Warehouse Product), fits that bill, saying you should focus on Business Rules over graphic diagrams like Entity Relationship diagrams when data modelling.
While it’s nice to have a diagram to explain how a database looks and joins together at a high level, Robert is right that non-data professionals will be more likely to understand well written Business Rules as it speaks in a language they understand than diagrams designed for data professionals.
dbt Features You Never Knew About
Analytics Engineer Madison Schott has written about a nice collection of less well-known dbt features: exposures, hooks, customising test names and test severity.
Warehouse-native Apps Explained
“Warehouse-native Apps” may sound like a meaningless buzzword, but they are already in use by Software as a Service (SaaS) companies and could be the main way to use SaaS products or give data access to third parties in the near future.
Arpit Choudhury explains what they are and gives use cases for how they can be used.
Also, Databricks has just announced a very similar feature called Lakehouse Apps so I expect to hear more on this topic.
How Bilibili Builds OLAP Data Lakehouse with Apache Iceberg
Bilibili is one of the biggest video-sharing websites in China and therefore processes a lot of data, to be exact, approx. 75TB a day. How do you build a high-performance Lakehouse on that much data?
Rui Li, one of Bilibili’s software engineers, gives us some insights, including how they index data and how they have built custom software to suggest future indexing based on past queries.
Google Dataform vs DBT
Sivaprasad Mandapati, Data Engineer and Architect, compares Dataform to dbt and also shows how to use Dataform.
You may know that Google has a product called Dataform that is very similar to dbt, is open source and can be used on a variety of databases. Dataform is also free on Google Cloud Platform (GCP), so if you are interested in GCP BigQuery, I’d recommend taking a look at Dataform too.
Sponsored by The Oakland Group, a full service data consultancy.