Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

P. MichaelsSoftware Architecture by Examplehttps://doi.org/10.1007/978-1-4842-7990-8_4

4. The Social Media Problem

Paul Michaels¹

(1)

Derbyshire, UK

Social media has been around for longer than you might think. While still at school and long before anyone had even heard the term “Facebook” (linked to a website), I used to run a Bulletin Board System (or BBS) . People would dial in to our house phone number, and they could access features in this system, which included such things as a “wall” – a term that is now familiar to anyone using Facebook and with a very similar idea. There was also a messaging network that allowed communication across other BBS, known as FidoNet (still running to this date).

While the BBS world was fun to be a part of, it was not a particularly scalable system, nor did it need to be; at my house, there was a single phone line, meaning that one person could access the board at a time (assuming no one was trying to use the phone to actually make a call – a limitation that frequently irked other members of my family). However, in recent years, social media has become much more popular.

Note

At the time of writing, Facebook has almost 3 billion active users, Twitter around 350 million, and LinkedIn around 250 million.

In this chapter, we’re going to explore methods of dealing with this kind of extreme scalability, along with the cost of such architectures.

Background

We have been asked by an existing company, Get Moving, to create a social media platform. Get Moving is an international leisure company, with a series of gyms spanning the globe; they also have a successful suite of apps that allow people to follow exercise classes remotely. They would like a social media platform that allows users to interact with other members of the brand (whether that be members of the gym or just people that use the app).

Get Moving has 300 locations worldwide, with a total close to 3M members across the globe. They also have a further 2M people using their apps. Having spoken to the marketing director, he feels that the launch of this social media platform will double that figure, and he expects some users to use the service on a daily basis.

Requirements

Having spoken to the managing director of Get Moving , you establish that the software should have the following features:

Users should be able to post details of their workout along with a comment.
Other users should be able to reply or comment on these status updates.
There are no geographical restrictions on who can view the posts.
Should be massively scalable (initially, they expect around 20,000 users, but as they roll this out to the gym users, they quickly expect that to reach around 1M active users).
While they expect the updates to appear on the timeline of both the user that has posted the update and the user that has commented, this does not have to be instantaneous. Further, the order of comments is unimportant.

After further conversations, you come up with a wireframe example of what this might look like.

Note

Those readers familiar with privacy laws may notice an issue with these requirements. Since this chapter is not concerned with those specific issues (as important as they may be), I’m going to ignore them, as doing so better serves the illustration; however, should you be faced with such a situation, you should ensure that your system does adhere to whatever rules are in place in the locale of your target audience.

Options

This set of requirements is, in fact, quite unique. When you typically speak to a client about their requirements for a system, they will not mention non-functional requirements , such as scalability. The reason is that they will assume that whatever functionality they ask for will be scalable, reliable, usable, extensible, secure, accessible, and all the other things that, as a software architect, you need to explicitly consider. The reason for this is that most modern systems are almost all of those things, out of the box. Taking ASP.NET as an example (since our example code is in .Net), you can create an ASP.NET website that will, with zero lines of code, be secure, usable, extensible, and accessible; depending on how you deal with the hosting, it can also be very scalable.

As the person designing the system, it then falls to you to identify if these requirements involve sacrifices elsewhere. The devil is in the details. What exactly do we mean by usable? What does reliable mean? Does it mean that the system should have zero downtime? Exactly what, about the system, would need to be extensible: Do the users of the system require the ability to extend the system, or do we just mean that a change to the functionality is quick and easy?

In truth, when we go to extremes with any of these nonfunctional requirements, we often sacrifice others. Let’s take making the system both usable and accessible as an example; should we need to make the system accessible to people with very poor eyesight, we may increase the size of controls and text, place fewer items on a screen, and perhaps incorporate some form of voice control; however, when you gave that system to a person with perfect eyesight, they may not describe the system as being usable.

In our case, the client has realized that their requirement of user numbers is not standard and therefore requires special mention. Our job at this stage is to highlight to them what, if anything, we would need to sacrifice in order to provide that scalability.

In previous chapters, we’ve approached the problem by first exploring what the system would look like without automation.

Manual Process

Admittedly, this scenario probably lends itself less to a manual process than any that we have discussed so far in this book; however, let’s complete the thought experiment and see where it leads.

The requirement here, in its simplest form, is that users can post details of their workout, and other users can comment on that – the other points are merely constraints and concessions. If we start here, we may envisage a notice board in each gym where people would perhaps fill out a card such as shown in Figure 4-1.

Here, we’re asking the user to populate the length of the workout, calories burned, comment, etc. We may then imagine that once this is posted, other users would add Post-it notes to the card. This seems to work well to an extent, but we have a problem with making this scalable: even within the same gym. For example, after a certain number of posts, the board would be full; further, the comments themselves would obstruct the posts.

Let’s imagine a slightly different scenario then: a printed sheet of the workouts for that week. After a member has completed their training, they would still fill out a card, but this time, they would hand the card to a member of staff in the gym. The cards would be piled up, and once every day, someone would type them up and print them out. We could follow a similar principle for comments: each time a member wished to comment, they would fill out a card and hand it to a member of staff, who would include the comments in the printout for the following day.

If we wanted to make this work for all the sites then we would, perhaps, have the member of staff post the comments to a central location, who would type out the posts and comments and send the sheets to every gym.

We did say initially that this would not lend itself to a manual process, but I suspect this mental picture will help when thinking about the design of the system.

CQRS

We’ve already discussed the concept of CQRS when we were looking at event sourcing in Chapter 2. However, in this case, we’re examining CQRS in its own right. Let’s first remind ourselves of what CQRS is, and then we can explore whether it will help us with our current problem.

CQRS stands for Command Query Responsibility Segregation . According to a 2011 post by Martin Fowler, this was something that was brought into the public domain by Greg Young.

The idea behind this pattern is, as the name suggests, that you segregate parts of your application that read data from those that write it. This provides some huge benefits, but it is far from without cost; we’ll explore both here.

Benefits

Let’s imagine that we decided to use a typical transactional database for our social media design. Figure 4-2 illustrates a familiar pattern.

Figure 4-2
Typical DB access architecture

The data store in this case is very likely ACID compliant (see Chapter 3). What this means is that each time you write data to a table, the application (and the user) must wait while the data is written, indexes are updated, constraints are checked, etc. This process is lightning fast, and so it can manage a high throughput. At the same time, you’re reading information from those same tables and very possibly updating records too. Let’s have a look what our relational data structure might look like for our social media solution.

Figure 4-3 shows a potential data structure (in fact, it doesn’t really matter what the data structure is for the purposes of this discussion).

As posts are being written to the database, comments may be being added. As we’ve already said, modern databases are incredibly good at performing these types of tasks quickly, but when your user base gets to very big numbers, that might not be quickly enough. If that is the case, then we need to ask what we can do about that and what is important, and what is not.

Since we’re listing the benefits of CQRS here, let’s think about how using CQRS can help us here. One way it could help is if we created a separate data store and wrote to that instead. To be clear, this need not necessarily be a separate database (although it can be); for example, we could introduce a system such as that shown in Figure 4-4.

You would then have an offline process that simply took the elements of the queue tables and wrote them to the Post and Comment tables. Although this does provide some benefit, more typically, the Command side writes to a different store.

Note

I am very deliberately using the term “data store,” rather than database, the reason being that there are many viable places to store such data and they are not all in a typical database.

The principle here is simply that you read from a read-only location. The location that you write to is obviously not write-only, because you would never be able to extract that information, but effectively you write to a buffer location. It may be that you decide to combine this approach with event sourcing; however, you could simply have your write location as a message bus.

As with everything relating to software, this is a trade-off.

Drawbacks

Now that we’ve seen the benefits of using CQRS, let’s talk about the price that you pay. The first and most obvious drawback here is, in fact, the same as the benefit. That is, the benefit is that you have separated read and write functionality, and so you have decoupled perhaps the biggest constraint to scalability; the drawback is that you have separated read and write functionality, and so your data is no longer strongly consistent; that is, when you read data, you are not guaranteed that you have read the latest version of the data.

Consistency Models

Data consistency refers to the state of the data across the system. For example, if I make a change to the data, a system that is said to be strongly consistent would show the same data state to all the users of that system; additionally, the user that made that change initially should see that change reflected.

Figure 4-5 illustrates what a strongly consistent system would look like. At a given point in time, all three users shown in Figure 4-5 would need to see the same information when looking for Field1.

This kind of model is typical when dealing with a system that has a single data store. It is bread and butter for most RDBMSs – a user updates a field, and all other users see that change. However, let’s imagine the scenario in Figure 4-6.

Looking at Figure 4-6, it’s difficult (although not impossible – see the “Distributed Transactions” section in Chapter 3) to imagine a situation where that setup would be strongly consistent.

Note

There are dozens of consistency models. In this section, I’m going to group them into three categories and cover a broad explanation of that category; however, in doing so, we will miss some of the nuance of the individual models.

Strong or Strict Consistency

As we’ve already alluded to, strongly consistent data is data that is reflected across an entire system in such a way that as the data is written, it is visible to all other parts of the system in the order in which it was written.

In this model, we pay a performance price in order the see the same view of the data across the system.

Sequential or Causal Consistency

In causal consistency, we no longer expect to see the data as it is written across the system; however, we do expect to see the data in the order in which it was written across the system. For example, if I write three pieces of data, [a, b, c], I would expect to see them written in that order.

In this model, we get better performance than with strong consistency, but we need to maintain the order; this means that an additional payload must travel with the data itself that gives a dependency graph of data. This, therefore, increases the amount of data being passed around and, in order to maintain the order, can result in blocking calls.

Weak or Eventual Consistency

The eventual consistency model allows a situation where I can write some data and that the data will be available for reading at some stage in the future, but in no guaranteed order.

This is by far the fastest model, but we pay a high price in that you can end up in a situation where either you are unable to see data that has been written or you are able to see it in a state where it is not final; for example, imagine the following customer data:

Customer Record: Customer Id, Customer Address Id.
Customer Address Record: Address

In eventual consistency, we can have a situation where either the address exists without the customer or the customer exists without the address. In fact, the only guarantee that we have is that when we have finished processing (i.e., eventually), everything will be created correctly.

For our purposes, eventual consistency seems like an excellent option – if a post is made on the site, it doesn’t really matter if that post is not immediately visible, nor does it matter if the comments don’t appear at the instant they are posted.

Target Architecture

Our target architecture is going to prefer performance and scalability over immediately consistent data. Let’s have a look at our target architecture diagram, and then we can discuss what this looks like in the real world.

Looking at the diagram shown in Figure 4-7, we can see that we have a number of clients that both read and write from the data in the system; however, following a write (or a command in CQRS parlance), the data is unceremoniously dumped into a NoSQL database. As stated earlier in this chapter, it is not necessary to have a NoSQL database here per se, nor do you need a separate database: CQRS can work within a single database.

Once the data is written to the NoSQL store, a service is responsible for reading what’s in that store and writing that to a relational database.

Note

It’s no accident that you’ll be feeling an overwhelming sense of déjà vu at this stage. Much of what we’re describing is very similar to the event sourcing that we discussed in Chapter 2. However, while the implementation details may be similar, the design principle is different; that is, in this chapter, we are separating the read and write access for the purpose of performance, whereas with event sourcing, we separate them out of necessity (i.e., you cannot – or at least should not - read from an event stream directly).

You may decide to add a message broker queue as a shock absorber for the service that picks up the data and converts it to relational data. This allows for adjustments to the flow if you find that it takes longer to write the data than to read it; however, in our case, the NoSQL database itself can act as a queue.

Finally, once we have the data in relational form, it can be queried by our client again.

Examples

In our example, we will use a local instance of MongoDB and a local instance of SQL Server. In fact, the architectural principle here requires neither of these specific databases, and so it should be cross applicable; however, while the SQL syntax would broadly work across most relational databases, the NoSQL syntax is specific to the Mongo API (or anything that emulates it, such as Cosmos).

Note

I will continue on the assumption that an instance of SQL Server and an instance of MongoDB are installed locally on your machine. A full explanation of these databases falls beyond the scope of this chapter (and book); however, MongoDB can be found here: www.mongodb.com/, and SQL Server here: www.microsoft.com/en-gb/sql-server/sql-server-downloads.

While we do not need to create a schema for Mongo, we do need to create one for SQL Server.

Schema Creation

Listing 4-1 should create a schema in SQL Server that will work for the purpose of this example.

CREATE TABLE [dbo].[Post](

Text nvarchar(max) NULL,

WorkoutDate datetime NULL,

Id uniqueidentifier DEFAULT NEWSEQUENTIALID(),

PRIMARY KEY (Id)

) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]

CREATE TABLE [dbo].[Comment](

[Text] nvarchar(max) NULL,

PostId uniqueidentifier,

Id uniqueidentifier DEFAULT NEWSEQUENTIALID(),

PRIMARY KEY (Id),

FOREIGN KEY (PostId) REFERENCES dbo.Post(Id)

) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]

Listing 4-1

Schema Creation

Note

The use of a Sequential GUID has some advantages over using a standard integer to index the table; however, as far as I’m aware, this is a SQL Server–only variable type. Should you decide to use a different RDBMS, just substitute this for an integer.

Looking at our target architecture, there are only two conceptual parts to it (excluding the databases); these are the Client and the Process Data Service.

Updating the Database

When deciding how we should update the database, there are some architectural decisions to consider.

Update a Local Version of the Database Directly

This approach uses the Mongo instance as a local cache – the instance of Mongo can be installed directly on the client machine, and the client can write to it directly.

As with any architectural decision, security should be considered. The security model here would be that that machine that hosts the website also hosts the MongoDB installation, meaning it would be protected via a firewall; however, at some stage, an application will need to transfer that information to the relational store that we are reading from. This can lead to decisions being made around servers and cloud providers that make this transition easier, cheaper, and safer.

This scenario works well in the case of the client being a website; the server side of the website may write directly to the MongoDB. The advantage here is that you get a much improved write speed; however, you couple your client directly to your database technology.

Note

The archetypal example of why you should abstract access to a database is that you may wish to swap out your database provider. While this is a generally good practice (e.g., for testing), it’s very unlikely that you will ever really swap out the database provider.

The approach of having a local instance of Mongo breaks down where, for example, your client is a desktop client. Installing an instance of Mongo in this case would not be feasible in all the locations where your application is installed; this is also true for mobile applications – in which case, this would not only be impractical but also impossible.

Note

While installing Mongo on every client is not feasible, there are other database and caching technologies that would make this possible. In Chapter 6, we discuss just such a scenario.

In our specific case, it seems to make more sense to call a web service to update the database.

Call a Web Service to Update the Database

Calling a web service is an excellent way to provide an abstraction layer on top of your data access, and it also means that you can deal with situations where your client and data are not co-located.

Note

The term “client,” in this context, is used to refer to any process or set of processes that consume a service. In our case, the client may be a website; however, the term “client” is overloaded, in that the portion of a website that operates on the user’s machine is also “Client.”

Because you are securing your web service using some form of token-based authentication, you’re less likely to be faced with the same security issues as where you update the MongoDB directly.

Our use case requires that users be able to update this from mobile devices, and so we have little choice but to wrap our update in a web service.

Checking the Data

In the following sections, we are going to write to, and read from, the data. In order to validate that this is happening correctly, we’ll need to see the data. There are a number of options; however, for SQL Server, I would recommend downloading SQL Server Management Studio (SSMS) , and for Mongo, MongoDB Compass.

SELECT p.*,

(SELECT COUNT(1)

FROM Comment c

WHERE c.PostId = p.Id) as "Comment Count"

FROM Post p

Listing 4-2

Check Posts and Comments

Listing 4-2 gives a SQL script that will display the posts, along with the count of comments next to them. This will be useful later on.

As with previous chapters, all the code can be found in the following GitHub repo:

https://github.com/Apress/software-architecture-by-example

Web Service

Since we need a web service, our first step is to create one to wrap the database update calls.

Appendix A covers the finer details of creating an API in .Net – so here we’ll just list the changes once the API is created (essentially, once the default API project has been created).

As you can see from Figure 4-8, we have two controllers in this project; let’s see the PostController.cs first.

[ApiController]

[Route("[controller]")]

public class PostController : ControllerBase

{

private readonly IMongoDBWrapper _mongoDBWrapper;

public PostController(IMongoDBWrapper mongoDBWrapper)

{

_mongoDBWrapper = mongoDBWrapper;

}

[HttpPost]

public async Task<string> Create() =>

await _mongoDBWrapper.CreatePost(DateTime.Now, $"test post {DateTime.Now}");

}

Listing 4-3

PostController.cs

There’s very little to this class, as you can see. However, we are utilizing a wrapper class (MongoDBWrapper), which provides the method CreatePost. We’ll come to that helper project soon, but first, let’s see the code in Listing 4-4 for the comment update.

[ApiController]

[Route("[controller]")]

public class CommentController : ControllerBase

{

private readonly IMongoDBWrapper _mongoDBWrapper;

public CommentController(IMongoDBWrapper mongoDBWrapper)

{

_mongoDBWrapper = mongoDBWrapper;

}

[HttpPost]

public async Task Create(string postId) =>

await _mongoDBWrapper.CreateComment($"test comment {DateTime.Now}", postId);

}

Listing 4-4

CommentController.cs

Before we discuss the wrapper itself, let’s see how that is registered in the Asp.Net DI container (Listing 4-5).

public void ConfigureServices(IServiceCollection services)

{

services.AddControllers();

services.AddSwaggerGen(c =>

{

c.SwaggerDoc("v1", new OpenApiInfo { Title = "SocialMedia.UpdateService", Version = "v1" });

});

services.AddSingleton<IMongoDBWrapper, MongoDBWrapper>();

}

Listing 4-5

Startup.cs

Let’s now discuss how we might access the MongoDB to update it – spoiler: this will essentially be filling in the methods that we’ve used before.

Accessing MongoDB

In order to access the MongoDB, we’re going to create a wrapper project.

Note

A wrapper or proxy project or class is simply a way to insulate yourself from a third party. The idea is that your code calls the wrapper, and the wrapper, in turn, calls the third party. This gives you a number of advantages: should the third-party make a breaking change to some method, then you have only a single place to change that interaction; further, you can control the abstraction within your own project for the purpose of testing; and finally, you have a single place to log and debug issues with the third-party interaction.

As you will see from Figure 4-9, we have a very simple project, consisting of an interface and implementation class.

Note

As you’ll see from the code, the location of the MongoDB and the database name are hard-coded – this is done partly to simplify the example, but also for readability. Such values should not be stored in the code.

Listing 4-6 shows the CreateComment and CreatePost methods.

public class MongoDBWrapper : IMongoDBWrapper

{

readonly IMongoDatabase _db;

public MongoDBWrapper()

{

var dbClient = new MongoClient("mongodb://localhost:27017");

_db = dbClient.GetDatabase("SocialMedia");

}

public async Task<string> CreateComment(string comment, string postId)

{

var newComment = new Comment()

{

PostId = postId,

Text = comment

};

var collection = _db.GetCollection<Comment>("Comments");

await collection.InsertOneAsync(newComment);

return newComment.Id.ToString();

}

public async Task<string> CreatePost(DateTime workoutDate, string comment)

{

var newPost = new Post()

{

WorkoutDate = workoutDate,

Text = comment

};

var collection = _db.GetCollection<Post>("Posts");

await collection.InsertOneAsync(newPost);

return newPost.Id.ToString();

}

Listing 4-6

MongoDBWrapper.cs

I’ve listed these two methods together in Listing 4-6, as they, essentially, do the same thing – firstly, they instantiate an object (e.g., Comment or Post) with appropriate values. Following this, we get a typed collection from Mongo; that is, we ask the DB for a collection of Posts or Comments; and then, once we have one, we add to that collection and return the ID.

We also need methods to return the next Post and the next Comment. Listing 4-7 shows these methods.

public async Task<Post> GetNextPost()

{

var collection = _db.GetCollection<Post>("Posts");

var result = await collection.FindOneAndDeleteAsync(a => true);

return result;

}

public async Task<Comment> GetNextComment(string postId)

{

var collection = _db.GetCollection<Comment>("Comments");

var result = await collection.FindOneAndDeleteAsync(a => a.PostId == postId);

return result;

}

Listing 4-7

MongoDBWrapper.cs

In Listing 4-7, we can see that we first return the collection as before using the GetCollection method; then, instead of inserting, we call FindOneAndDeleteAsync; this takes the first record that it finds according to the criteria provided, returns the record, and then deletes it (so that we don’t re-read the record).

Note

There are, essentially, two ways that you can process data in this way, and they both have advantages and disadvantages. The way that we’re using here reads and deletes the record and then writes that record to another location; the downside here is that if the system crashes between the two processes, the record may be lost. The alternative is to read the record, then write that to the destination system, and then delete it from the source; this has the advantage that no data is lost, but the downside is that it is slower – in some cases, much slower.

The next thing that we’ll need to do is to create our client application.

The Client

The client application, for this example, will simply be a console application. We’re basically going to call the web services – as we discussed earlier, given the required deployment model, we have little choice on this.

Listing 4-8 shows the menu of the application.

static async Task Main(string[] args)

{

while (true)

{

Console.WriteLine("Choose action");

Console.WriteLine("1: Create Post");

Console.WriteLine("2: Comment on Post");

Console.WriteLine("3: Create Post and Comment");

Console.WriteLine("4: Small Bulk Test");

Console.WriteLine("5: Large Bulk Test");

Console.WriteLine("0: Exit");

var result = Console.ReadKey();

switch (result.Key)

{

case ConsoleKey.D1:

await CreatePost();

break;

case ConsoleKey.D2:

await CreateComment();

break;

case ConsoleKey.D3:

var postId = await CreateSinglePost();

await CreateComment(postId);

break;

case ConsoleKey.D4:

for (int i = 1; i <= 100; i++)

{

await CreatePost();

await CreateComment();

}

break;

case ConsoleKey.D5:

for (int i = 1; i <= 100; i++)

{

Console.WriteLine($"Processing batch {i}");

for (int j = 1; j <= 100; j++)

{

await CreatePost();

await CreateComment();

}

await Task.Delay(20);

}

break;

case ConsoleKey.D0:

return;

}

Listing 4-8

Program.cs

There’s quite a lot of code here, but it’s all very simple: we display the menu and then respond to the keypress. Options 1 and 2 create a single Post or Comment; option 3 creates a Post and then a Comment for that Post; options 4 and 5 are for the purpose of testing load, and they create many posts and comments.

In Listing 4-9, we can see the CreateSinglePost method.

private static async Task<string> CreateSinglePost()

{

var httpClient = HttpClientFactory.Create();

var httpContent = new StringContent("");

var result = await httpClient.PostAsync("https://localhost:44388/Post", httpContent);

Debug.Assert(result.IsSuccessStatusCode);

return result.Content.ToString();

}

Listing 4-9

Program.cs

The first thing we do here is use the HttpClientFactory to create an HttpClient object. The HttpClient object allows us to make HTTP calls outside of the application.

Note

The HttpClientFactory provides several advantages over instantiating an HttpClient directly; however, the most important is that it prevents socket exhaustion: the issue here being that as an HttpClient is disposed of, it takes some additional time for the underlying socket to be released; using a factory pattern means that the same instance of the HttpClient can be reused.

Following this, we call PostAsync to invoke the HttpPost method of the controller.

Note

You may notice a Debug.Assert( ) call. This is a mechanism to ensure that during development, certain things are true about your code; where they are not, the code will break (essentially, this functions as a conditional breakpoint).

The CreateComment is a virtually identical method, as you will see from Listing 4-10 .

private static async Task CreateComment(string postId = null)

{

if (postId == null) postId = _posts.GetRandom();

var httpClient = HttpClientFactory.Create();

var httpContent = new StringContent(postId);

var result = await httpClient.PostAsync("https://localhost:44388/Comment", httpContent);

Debug.Assert(result.IsSuccessStatusCode);

}

Listing 4-10

Program.cs

From the code in Listing 4-10, we can see that the only real difference is that the CreateComment method accepts a parameter, which it passes to the /comment endpoint. Also, CreateComment does not return a value, whereas CreateSinglePost does; you’ll notice that we’re accepting this parameter as optional and, where it isn’t supplied, we pick a random Post.

Note

The idea behind picking a random post is for testing purposes only. Clearly, if this were a real system, each comment would be attached to whichever post it was made about.

The only part of this code that’s now missing is the CreatePost method , which we can see along with a class level static variable in Listing 4-11.

static readonly List<string> _posts = new List<string>();

private static async Task CreatePost()

{

var createPostResult = await CreateSinglePost();

_posts.Add(createPostResult);

}

Listing 4-11

Program.cs

Listing 4-11 shows that when a post is created, we add that post to an in-memory list of posts.

The final piece of the puzzle is the offline process that reads from the MongoDB and writes that information to SQL Server.

Process Data Service

The ProcessDataService is a very simple console application.

As Figure 4-10 illustrates, the ProcessDataService has a single code file. In fact, it consists of only three methods; Listing 4-12 shows the Main method.

static async Task Main(string[] args)

{

using var connection = new SqlConnection(@"Data Source=.SQLEXPRESS;Initial Catalog=SocialMedia;Integrated Security=True;");

connection.Open();

await ReadPosts(connection, new MongoDBWrapper());

}

Listing 4-12

Program.cs

As we can see, all we’re doing here is establishing a SQL Server connection, an instance of the MongoDBWrapper, and passing both into a separate method: ReadPosts.

Listing 4-13 shows the ReadPosts method.

private static async Task ReadPosts(SqlConnection connection, MongoDBWrapper wrapper)

{

while (true)

{

// Read from Mongo

var nextPost = await wrapper.GetNextPost();

if (nextPost == null) break;

Console.WriteLine($"ReadPosts: Read {nextPost.Id}");

using var transaction = connection.BeginTransaction();

// Write To Sql Server

string sql = "DECLARE @newRecord table(newId uniqueidentifier); "

+ "INSERT INTO Post "

+ "(Text, WorkoutDate) "

+ "OUTPUT INSERTED.Id INTO @newRecord "

+ "VALUES "

+ "(@text, @workoutDate) "

+ "SELECT CONVERT(nvarchar(50), newId) FROM @newRecord";

var result = await connection.QueryAsync<string>(sql,

new { text = nextPost.Text, workoutDate = nextPost.WorkoutDate },

transaction);

// Get all comments for post

await ReadComments(transaction, connection,

result.Single(), nextPost.Id.ToString(), wrapper);

transaction.Commit();

}

Console.WriteLine("ReadPosts: End");

}

Listing 4-13

Program.cs

In Listing 4-13, we can see that the ReadPosts method is a loop that is exited only where GetNextPost returns null; that is, where there are no further posts available. Following this, a transaction is established (for more discussion on transactions, please see Chapter 3). We then insert the post into the SQL table and subsequently call ReadComments (see Listing 4-14). Following this, we commit the transaction.

Note

It’s worth noting that, as we stated earlier, should the transaction fail and the code crash, the post that was read from the Mongo database would be lost; in our situation, we’ve made the decision that we can live with such data loss; however, many systems cannot live with data loss of any kind.

private static async Task ReadComments(SqlTransaction transaction, SqlConnection connection,

string postId, string filterPostId, IMongoDBWrapper wrapper)

{

while (true)

{

// Read from Mongo

var nextComment = await wrapper.GetNextComment(filterPostId);

if (nextComment == null) break;

// Write To Sql Server

string sql = "INSERT INTO Comment "

+ "(Text, PostId) "

+ "VALUES "

+ "(@text, @postId)";

var result = await connection.ExecuteAsync(sql,

new { text = nextComment.Text, postId = postId },

transaction);

}

Listing 4-14

Program.cs

The ReadComments in Listing 4-14 method is very similar to the ReadPosts method that we saw earlier; we read the comments from the Mongo database and try to insert them into the relational database; we do this using the same transaction, meaning that the post and comments will either get written together or not at all.

Note

As we’ve said, transactions are covered in more detail in Chapter 3; however, transactions, or at least transactions that span several operations, are not always an entirely good thing. It may be that if there were a system error, having the Post without the Comment may have value; it would certainly improve throughput to dispense of this transaction.

Figure 4-11 shows the data that has been transferred from the MongoDB across to the relational database.

Summary

We have focused heavily in this chapter on scalability, and we’ve paid a hefty price for it. The kind of scale that we’re talking about here is far beyond what an average system will need to deal with – in many cases, a CQRS system such as this would be overkill and simply add complexity. However, if used at the right time, it can help you to scale your system: especially when immediate consistency is not a restricting factor.

In this chapter, we’ve also covered using a NoSQL database as a kind of dump. This is another view on using CQRS, which is that you can allow the customer to leave the point of interaction (sale, purchase, or whatever) and just keep the raw transaction to process later. You can then do whatever you choose with that data at a later time. This particular facet is concerned more with performance (or at least perceived performance) than scalability – although it does relate to both.

If we return to our, by now familiar, trick of imagining a sale (for example) in a real-world scenario, writing a transaction to a SQL relational database would be akin to getting out a general ledger in front of the customer and entering their transaction in it, reconciling the VAT, and other accounts. Obviously, computers do these things quickly, so you may be able to do such things while they wait, but if you find that the checkout process is taking seconds, rather than milliseconds, then maybe it’s time to reconsider exactly what you’re asking the customer to wait for.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 4. The Social Media Problem

Create new playlist

Sign In

Sign Up

4. The Social Media Problem

Background

Requirements

Options

Manual Process

CQRS

Benefits

Drawbacks

Consistency Models

Strong or Strict Consistency

Sequential or Causal Consistency

Weak or Eventual Consistency

Target Architecture

Examples

Schema Creation

Updating the Database

Update a Local Version of the Database Directly

Call a Web Service to Update the Database

Checking the Data

Web Service

Accessing MongoDB

The Client

Process Data Service

Summary

Table of Contents for
4. The Social Media Problem