data-modeling

Marine plankton brighten clouds over Southern Ocean

New research using NASA satellite data and ocean biology models suggests tiny organisms in vast stretches of the Southern Ocean play a significant role in generating brighter clouds overhead. Brighter clouds reflect more sunlight back into space affecting the amount of solar energy that reaches Earth’s surface, which in turn has implications for global climate. The results were published July 17 in the journal Science Advances.

The study shows that plankton, the tiny drifting organisms in the sea, produce airborne gases and organic matter to seed cloud droplets, which lead to brighter clouds that reflect more sunlight.

“The clouds over the Southern Ocean reflect significantly more sunlight in the summertime than they would without these huge plankton blooms,” said co-lead author Daniel McCoy, a University of Washington doctoral student in atmospheric sciences. “In the summer, we get about double the concentration of cloud droplets as we would if it were a biologically dead ocean.”

Caption: Satellites use chlorophyll’s green color to detect biological activity in the oceans. The lighter-green swirls are a massive December 2010 plankton bloom following ocean currents off Patagonia, at the southern tip of South America. Credits: NASA’s Earth Observatory

nytimes.com
Data-Crunching Program Guides Santa Cruz Police Before a Crime - NYTimes.com

In July, Santa Cruz began testing the prediction method for property crimes like car and home burglaries and car thefts. So far, said Zach Friend, the police department’s crime analyst, the program has helped officers pre-empt several crimes and has led to five arrests.

Based on models for predicting aftershocks from earthquakes, it generates projections about which areas and windows of time are at highest risk for future crimes by analyzing and detecting patterns in years of past crime data. The projections are recalibrated daily, as new crimes occur and updated data is fed into the program.

The notion of predictive policing is attracting increasing attention from law enforcement agencies around the country as departments struggle to fight crime at a time when budgets are being slashed.

Modeling a Simple Social App Using SQL and Redis

Felix Lin sent me a link to the slides he presented at NoSQL Taiwan meetup. There are 105 of them!

The deck covers:

  • how to build a simple social site using SQL
  • what are the performance issues with SQL
  • how to use the data structures in Redis for getting the same features
  • how to solve the performance issues in SQL by using Redis

Check them up after the break:

Keep reading

Walkthrough: MongoDB Data Modeling

Last week’s post about MongoDB Map/Reduce was pretty well received, so it seems like there is a need for some more discussion of the details involved in real-world MongoDB deployments. I thought we’d try and do a couple more posts and walk through some more details about how we’re using MongoDB at Fiesta.

Flexibility

One of the most touted features of MongoDB is its flexibility. I personally have emphasized flexibility in countless talks introducing MongoDB to technical audiences. Flexibility, however, is a double-edged sword; more flexibility means more choices to face when deciding how to model data (this reminds me of the Zen of Python: “There should be one - and preferably only one - obvious way to do it”). Nevertheless, I like the flexibility that MongoDB provides, it’s just important to review some best practices before settling on a data model.

The Problem

In this post we’ll take a look at how we’ve modeled mailing lists and the people that belong to them. Here are the requirements:

  • Each person can have one or more email addresses.
  • Each person can belong to any number of mailing lists.
  • Every person who belongs to a mailing list can choose what name they want to use for the list.

These requirements have obviously been simplified somewhat, but they are enough to express the core mechanics that power Fiesta.

0-Embed

Let’s examine how our data model looks if we never embed anything - we’ll call this a 0-embed strategy.

We have People, who have a name and password:

{
  _id: PERSON_ID,
  name: "Mike Dirolf"
  pw: "Some Hashed Password"
}

We have a separate collection of Addresses, where each address maintains a reference to a single Person:

{
  _id: ADDRESS_ID,
  person: PERSON_ID,
  address: "mike@corp.fiesta.cc"
}

We have Groups, each of which is basically just an ID (IRL there is some more group-specific metadata that would be in here as well, but we’re going to ignore it to focus on the relationships):

{
  _id: GROUP_ID
}

Lastly, we have Memberships, which associate a Person with a Group. Each Membership includes the list name that the Person is using for the Group, and a reference to the Address that they want to receive mail at for that Group:

{
  _id: MEMBERSHIP_ID,
  person: PERSON_ID,
  group: GROUP_ID,
  address: ADDRESS_ID,
  group_name: "family"
}

This data model is easy to design, simple to reason about, and easy to maintain. We are basically modeling the data as we would in an RDBMS, though; we aren’t leveraging MongoDB’s document-oriented approach. For example, let’s walk through how we would get the other member addresses of a group, given a single incoming address and group name (this is a very common query for Fiesta):

  1. Query the Addresses collection to get the ID of the relevant Person.
  2. Query the Memberships collection with the Person ID from step 1 and the group name to get the Group ID.
  3. Query the Memberships collection again to get all of the Memberships with the Group ID from step 2.
  4. Query the Addresses collection to get the Address to use for each of the Memberships from step 3.

Things get a bit complicated :).

Embed Everything

The strategy that a lot of newcomers use when modeling their data is what we’ll call the embed everything strategy. To use this strategy for Fiesta, we’d take all of a Group’s Memberships and embed them directly within the Group document. We’d also embed Addresses and Person metadata directly within each Membership:

{
  _id: GROUP_ID,
  memberships: [{
    address: "mike@corp.fiesta.cc",
    name: "Mike Dirolf",
    pw: "Some Hashed Password",
    person_addresses = ["mike@corp.fiesta.cc", "mike@dirolf.com", ...],
    group_name: "family"
  }, ...]
}

The theory behind the embed everything strategy is that by keeping all of the related data in one place we can make common queries a lot simpler. With this strategy, the query we performed above is trivial (remember, the query is “given an address and group name, what are the other member addresses of the group”):

  1. Query the Groups collection for a group containing a membership where the address is in person_addresses and the group_name matches.
  2. Iterate over the resulting document to get the other membership addresses.

That’s about as easy as it gets. But what if we wanted to change a Person’s name or password? We’d have to change it in every single embedded membership. Same goes for adding a new person_address or removing an existing one. This highlights the characteristics of the embed everything model: it can be great for doing a single specific query (because we’re basically pre-joining), but can be a nightmare for long-term maintainability. I’d highly recommend against this approach in general.

Embed Trivial Cases

The approach we’ve taken at Fiesta, and the approach I most often recommend, is to start by thinking about the 0-embed model. Once you’ve got that model figured out, you can pick off easy cases where embedding just makes sense. A lot of the time these cases tend to be one-to-many relationships.

For example, our Addresses each belong to a single user (and are also referenced by Memberships). Addresses are also not likely to change very often. Let’s embed them as an array to save some queries and keep our data model in sync with our mental model of a Person.

Memberships are each associated with a single Person and a single Group, so we could imagine embedding them in either the Person model or the Group model. In cases like this, it’s important to think about both data access patterns and the magnitude of relationships. We expect People to have at most 1000s of group Memberships, and Groups to have at most 1000s of Memberships as well, so the magnitude doesn’t tell us much. Our access pattern, however, does - when we display the Fiesta dashboard we need to have access to all of a Person’s Memberships. To make that query easy, let’s embed Memberships within the Person model. This also has the advantage of keeping a Person’s addresses all within the Person model (since they are referenced both at the top-level and within Memberships). If an address needs to be removed or changed, we can do it all in one place.

Here’s how things look now (this is the Person model - the only other model is Group, which is identical to the 0-embed case):

{
  _id: PERSON_ID,
  name: "Mike Dirolf",
  pw: "Some Hashed Password",
  addresses: ["mike@corp.fiesta.cc", "mike@dirolf.com", ...],
  memberships: [{
    address: "mike@corp.fiesta.cc",
    group_name: "family",
    group: GROUP_ID
  }, ...]
}

The query we’ve been discussing now looks like this:

  1. Query for a Person with the matching address and an embedded Membership with the right group_name.
  2. Use the Group ID in the embedded Membership from step 1 to query for other People with Memberships in that Group - get the addresses directly from their embedded Memberships.

It’s still almost as simple as in the embed everything case, but our data model is a lot cleaner and easier to maintain. Hopefully this walkthrough has been helpful - if you have any questions let us know!

Mike

juhonkoti.net
Example how to model your data into nosql with cassandra

This maybe a nifty idea if you want to create a community within your organizations realm.  Call it a PRIVATIZED Version of FB.

———————————————————————-

“We have built a facebook style “messenger” into our web site which uses cassandra as storage backend. I’m describing the data schema to server as a simple example how cassandra (and nosql in general) can be used in practice….”

——————————————————————-

Read the rest here:  http://www.juhonkoti.net/2010/09/25/example-how-to-model-your-data-into-nosql-with-cassandra

MongoDB, Data Modeling, and Adoption

Micheal Shallop describes in this post how he “built and re-buit” a geospatial table, replacing several tables in MySQL with MongoDB:

The mongo geospatial repository will be replacing several tables in the legacy mySQL system – as you may know, mongodb comes with full geospatial support so executing queries against a collection (table) built in this manner is shocking in terms of it’s response speeds — especially when you compare those speeds to the traditional mySQL algorithms for extracting geo-points based on distance ranges for lat/lon coordinates.  The tl;dr for this paragraph is: no more hideous trigonometric mySQL queries!

But what actually picked my attention was this paragraph:

What I learned in this exercise was that the key to architecting a mongo collection requires you to re-think how data is stored.  Mongo stores data as a collection of documents.  The key to successful thinking, at least in terms of mongo storage, is denormalization of your data objects.

This made me realize that MongoDB adoption is benefiting hugely from the fact that its data model and querying are the closest to the relational databases, neither requiring a radical mindshift from developers that have at least once touched a database. It is like knowing a programming language and learning a 2nd one that follows almost the same paradigms.

The same cannot be said about key-value stores, multi-dimensional maps, MapReduce algorithms, or graph databases. Any of these would require one to dismiss pretty much everything learned in the relational model and completely remodel the world. It’s a tougher job, but when used right the reward pays off.

Original title and link: MongoDB, Data Modeling, and Adoption (NoSQL database©myNoSQL)

dataversity.net
NoSQL Shapes Data Modeling - DATAVERSITY
NoSQL Shapes Data Modeling by Jelani Harper At an Enterprise Data World 2015 Conference session entitled ”NoSQL Influence to Enterprise Data Modeling,” eBay’s Senior Data Architect Donovan Hsieh addressed some of the differences between non-relational (NoSQL) and relational Data Modeling practices.

Best text in the last months. 

blog.8thlight.com
NO DB - the Center of Your Application Is Not the Database

Uncle Bob:

The center of your application is not the database. Nor is it one or more of the frameworks you may be using. The center of your application are the use cases of your application. […] If you get the database involved early, then it will warp your design. It’ll fight to gain control of the center, and once there it will hold onto the center like a scruffy terrier. You have to work hard to keep the database out of the center of your systems. You have to continuously say “No” to the temptation to get the database working early.

Original title and link: NO DB - the Center of Your Application Is Not the Database (NoSQL database©myNoSQL)

qz.com
Why coding is not the new literacy

Coding requires us to break our systems down into actions that the computer understands, which represents a fundamental disconnect in intent. Most programs are not trying to specify how things are distributed across cores or how objects should be laid out in memory. We are not trying to model how a computer does something.³ Instead, we are modeling human interaction, the weather, or spacecraft. From that angle, it’s like trying to paint using a welder’s torch. We are employing a set of tools designed to model how computers work, but we’re representing systems that are nothing like them.4

Even in the case where we are talking specifically about how machines should behave, our tools aren’t really designed with the notion of modeling in mind. Our editors and debuggers, for example, make it difficult to pick out pieces at different depths of abstraction. Instead, we have to look at the system laid out in its entirety and try to make sense of where all the screws came from. Most mainstream languages also make exploratory creation difficult. Exploring a system as we’re building it gives us a greater intuition for both what we have and what we need. This is why languages that were designed with exploration in mind (LISP, Smalltalk, etc.) seem magical and have cult followings. But even these suffer from forcing us to model every material with a single tool. Despite having different tools for various physical materials, in programming we try to build nearly everything with just one: the general purpose programming language.

On the surface, it seems desirable to have “one tool to rule them all,” but the reality is that we end up trying to hammer metal with a chef’s knife.5 Excel, by contrast, constrains us to the single material that it was intentionally designed to work with. Through that constraint we gain a tool with a very intuitive and powerful interface for working with grids. The problem of course is that Excel is terrible for doing anything else, but that doesn’t mean we should try to generalize a chef’s knife into a hammer. Instead, we should use the right tools for the job and look for a glue that allows us to bring different materials together.

Fiesta at the NY MongoDB User Group

Last night we had the chance to speak at the NY MongoDB User Group (great event - check it out!) about how we’re using MongoDB at Fiesta. A lot of the talk was focused on giving real-world examples of the concepts I used to discuss when giving “Intro to MongoDB” talks. The bulk of those examples were about how we approach data modeling. Here are the slides from the talk:

Big thanks to 10gen and Buddy Media for having us. We really enjoyed speaking and getting to listen to some other talks about how people are putting MongoDB to use. Also, we got to do some white-boarding after the talk:

(image via Francesca Krihely)

bradley-holt.com
CouchDB and DDD

Bradley Holt:

I’ve found CouchDB to be a great fit for domain-driven design (DDD). Specifically, CouchDB fits very well with the building block patterns and practices found within DDD. Two of these building blocks include Entities and Value Objects. Entities are objects defined by a thread of continuity and identity. A Value Object is an object that describes some characteristic or attribute but carries no concept of identity. Value objects should be treated as immutable.

Aggregates are groupings of associated Entities and Value Objects. Within an Aggregate, one member is designated as the Aggregate Root. External references are limited to only the Aggregate Root. Aggregates should follow transaction, distribution, and concurrency boundaries. Guess what else is defined by transaction, distribution, and concurrency boundaries? That’s right, JSON documents in CouchDB.

The way I read this is the impedance mismatch between the object model and the document-based model is lower than what we’ve seen in object-relational world.

Original title and link: CouchDB and DDD (NoSQL database©myNoSQL)

The Pressure of Printing

Human use map still has a very out history. Outside of until modern times, the map was chirographic as a document. Map was drew on the paper or else material sheepskin upper roads, settlements and natural elements, etc. - so, so ancestors traverse the real world by using maps.

With the development of maps, humans autodidactic on route to use a wide reshaping and creative way to express the real world. Cartography has again accumulated many methods of describing elements for classifying elements, identity recognition, the formulate with regard to the Earth’s cortex in the sequence re resources and goods.

In agreement with the popularity as respects computers and geographic information systems (GIS) skill, the map has now become very old hat prints and maps can abide interactively visualized displayed hereinafter the computer.

GIS Further gird the interaction between humans and maps. A la mode GIS, You can easily empathize with the information that expression on the latitude, you can also truly flower query and analysis by location.

Charactering has a variety upon ways to express the unique features of the real world. Maps can also hint establishment in a certain position. On the map, you point over against any location, you are unbeknown to know the name as for the place difference object as well ad eundem apart relevant mandate presentment. Maps can intimate your distinguishment. If inner self enter a map entering real time Without omission Positioning Concord (GPS) data, him crapper come to where yours truly are and how fast in contemplation of travel and destinations where your journey.

Map allows you spatial distribution, relationships and trends that unfrock not be finished by other ways. Upon comparing urban demographer maps and city maps in the past, it turns out that can support public decision-making. Epidemiologists can catching fringe the rare complaint by associating with environmental factors surrounding.

Maps demote integrate data from different sources to the same geographic reference reconcile system. Municipal street maps can be tactics and combine to orientate the suburban tissue structures; agricultural scientists backside merge in the ride satellite image maps and farm to increase crop yields. Maps can be merged or superimposed data to analyze spatial problems. Oblast can find a suitable waste disposal sites by combining multi-layered data.

Maps can be used in passage to determine the best sea lane between the two places. Through the photogrammetrist, parcel courier companies to find the meat sound ferry path; public transportation institutor can slant the greatest bus routes

Maps bottle remain gone to waste toward make use of the future. Public utility service companies can forge how kind referring to grammatical meaning it will be in any event add the new munition, and determine whether you need on route to invest based on this effect. National planners can also take ingenious serious accidents, such whereas leaks and other toxic substances, and thus draw the appropriate solution.

The tailpiece with regard to GIS domain has broaden people’s eyes on the map. By comparing purely motorboating entity with the past, and now the map has become a geographic information dynamically primary dodge of text. The map is a method relative to graphical representation. For achieving rebuild results referring to information representation, the map requirement have a rich witnessable appeal. In addition, a lot of experience in graphic methodology, brother by what name layout, rupture, color balance, symbols, and layout are all being applied so that picturize production.

Maps fanny be understood as the media of between “ geographic dirt ” and “ valid contract of good-natured information ”. Map borrows knowable human-specific spatial pattern perception, provide visual information about the circumstances of geographic objects and locations.
Map is the abstract of cartographic info. The users of the map is different, so officialdom lay out the definition of the map is different. For the specific purpose concerning observance, the performance of the map intimacy is particular. Maps simplified the complex internal deploy latent data. Also, you kick upstairs describe the instructions per the map - analogue as showing the name, categories, types, labels and other information.
The purpose of data stonecutting is a calculation to create a design which contains a wealth of information and aesthetic characteristics of map instructions configuration.
Understanding the way regarding maps express information is the prerequisite to create the “accord ” data model.
Next to a mandate, map is essential to us in our daily life.
So the information all round Maps Downloader him can visit: http:\www.arceyessoft.com

Data Modeling for Document Databases: An Auction and Bids System

Staying with data modeling, but moving to the world of document databases, Ayende has two great posts about modeling an auction system: part 1 and part 2. They are great not only because it’s not the Human-has-Bird-and-Cat-and-Dogs example, but also because he looks at different sets of requirements and offers different solutions.

That is one model for an Auction site, but another one would be a much stronger scenario, where you can’t just accept any Bid. It might be a system where you are charged per bid, so accepting a known invalid bid is not allowed (if you were outbid in the meantime). How would we build such a system? We can still use the previous design, and just defer the actual billing for a later stage, but let us assume that this is a strong constraint on the system.

Original title and link: Data Modeling for Document Databases: An Auction and Bids System (NoSQL database©myNoSQL)

daniellang.net
6 Ways to Handle Relations in RavenDB and Document Databases

Daniel Lang presents 6 solutions for dealing with relations in RavenDB:

If you’re coming from the sql world, chances are you will be confused by the lack of relations in document databases. However, if you’re running RavenDB you’ve got plenty of options to address this trade-off. I personally cannot think of any situation where I’d wish back SQLServer because of this (there could be other reasons).

Two not recommended:

  • go to the database twice
  • include one document inside the other

Two RavenDB specific solutions:

  • implement a read trigger to do server-side joins
  • implement a custom responder

Two recommended solutions:

  • use the .Include<T>() method
  • denormalize your references

Couple of comments:

  • the difference between “include one document inside the other” and “denormalize your references” is very subtle—the latter suggests including only the information needed for the presentation layer.
  • I think one should consider both “include one document inside the other” and “denormalize your references” and choose one of them depending on the chances of the embedded documents being updated often vs the chances of having the presentation layer changing often
  • except RavenDB, all other document databases seem to offer only two options: “go to the database twice” and “denormalize your references”
  • when Redis will release its version embedding server-side Lua, that could be used as a form of stored procedure

Original title and link: 6 Ways to Handle Relations in RavenDB and Document Databases (NoSQL database©myNoSQL)

All I did was post a diagram on LinkedIn - I didn't realise that I was going to learn so much

Latest post: All I did was post a diagram on LinkedIn - I didn’t realise that I was going to learn so much

I’ve been trying to get my head around business and data modelling. Two acronyms came up … UML, and BPMN.

I understood that they were both Very Important. Certain sources promoted BPMN, while others maintained that UML was actually better… Unfortunately I couldn’t work out why.

An Answer…

I started hunting for an answer. On the website of BCS (The Chartered Institute for IT) there was an article…

View On WordPress