Lecture 4: Data modelling in MongoDB

Lecture 4: Data modelling in MongoDB#

Learning objectives#

By the end of this lecture, students should understand:

  • Why data modelling is important

  • The key principle of data modelling in MongoDB

  • Techniques to develop a data model: workload estimate, identify relationships, schema patterns

  • The difference between embedding and referencing and when to use each type

Slides#

Note

Download a PDF version here

Supplemental materials#

Embed vs reference guideline#

Guideline name

Question

Embed

Reference

Simplicity

Would keeping the pieces of information together lead to a simpler data model and code?

Yes

No

Go Together

Do the pieces of information have a “has-a,” “contains,” or similar relationship?

Yes

No

Query Atomicity

Does the application query the pieces of information together?

Yes

No

Update Complexity

Are the pieces of information updated together?

Yes

No

Archival

Should the pieces of information be archived at the same time?

Yes

No

Cardinality

Is there a high cardinality (current or growing) in the child side of the relationship?

No

Yes

Data Duplication

Would data duplication be too complicated to manage and undesired?

No

Yes

Document Size

Would the combined size of the pieces of information take too much memory or transfer bandwidth for the application?

No

Yes

Document Growth

Would the embedded piece grow without bound?

No

Yes

Workload

Are the pieces of information written at different times in a write-heavy workload?

No

Yes

Individuality

For the children side of the relationship, can the pieces exist by themselves without a parent?

No

Yes