Unlocking the Power of Concurrency
and Parallelism in Modern .NET Applications
What is Concurrency?
Concurrency, a vital concept in
modern programming, refers to a program's ability to execute multiple
independent operations simultaneously, with these concurrent tasks often
occurring in an overlapping fashion. Importantly, the order in which these
tasks are executed doesn't affect the final result, as long as they remain
independent. They can start, run, and complete in an interleaved manner without
the need to run at the exact same moment.
It's crucial to note that
concurrency and parallelism aren't synonymous, though concurrency serves as a
means to achieve parallelism. Parallelism entails the execution of two or more
tasks at the same instant, or in other words, simultaneously.
Even in the case of a single-core
CPU, which can't run multiple tasks simultaneously, it can offer a form of
virtual parallelism by rapidly and seamlessly switching between different
tasks. This creates the illusion of parallel processing. In contrast, a
multi-core CPU can genuinely achieve parallelism when the program is
well-optimized to utilize the multiple cores effectively.
Concurrency and parallelism are
powerful techniques that empower developers to create scalable and efficient
applications, making the most of today's hardware and resources. By employing
appropriate tools and techniques, it's possible to greatly enhance the
responsiveness and performance of these applications.
Process and Thread
Now that we've gained an
understanding of what concurrency and parallelism means, let's delve into the
realm of concurrent programming, where processes and threads are the key
players responsible for carrying out tasks.
In the context of programming,
each application initiates with one or more processes and a process have at
least one thread. Any running instance of a program is essentially a process.
These processes operate in isolation from one another, with each having its
dedicated memory space. Within a process, multiple threads can exist, all of
which share the same memory space and have the capability to access memory
locations used by other threads within the same process. Threads are the
fundamental units to which the operating system allocates processor time,
essentially abstracting the concept of a virtual processor.
Asynchronous Programming
The concept of asynchronous
programming often kicks in when considering concurrency in an application.
Almost every modern programming language supports asynchronous programming.
Async programming is the way of writing code so that long-running functions
continue to execute whereas the control is quickly returned to the caller. In
case of tasks that are I/O bound or CPU bound, we can invoke such methods
asynchronously and perform independent operations until the result is required.
As a result, it will increase the responsiveness and performance of the
application.
To make any method asynchronous,
we have to mark the method with ‘async’ keyword. It gives us ability to use
another keyword ‘await’ inside the method that allows to wait the
time-consuming calls asynchronously.
Let’s consider a simple example
public class NasaService
{
private static readonly HttpClient client = new HttpClient();
const string API_KEY = "DEMO"; // Replace with
your API key
public async Task<string>
GetApodAsync()
{
string apiUrl = $"https://api.nasa.gov/planetary/apod?api_key={API_KEY}";
try
{
Task<string> getResult =
client.GetStringAsync(apiUrl);
// <<Perform
independent tasks if any>>
string json = await getResult; // Async call
JObject apod = JObject.Parse(json);
return $"title:
{apod["title"]}\nhdurl: {apod["hdurl"]}\nexplanation:
{apod["explanation"]}";
}
catch (Exception ex)
{
return $"Error:
{ex.Message}";
}
}
}
In the above example, when the
program execution reaches await keyword, the control is returned back to the
caller until the result is received. This means that the thread is now free and
can handle other tasks or requests. Once the awaited operation completes,
execution resumes and the remaining code is processed. The way execution
continues and how the thread is managed may vary depending on the type of
application (WinForms, Console, Web, etc.).
Does the application a separate
thread for the execution of awaited task? The simple answer is NO. While there
are conditions where async/await can create a new thread, a true async
operation does not necessarily spin up a new thread. In this case, the
GetStringAsync operation is an asynchronous operation that sends an HTTP GET
request to an external resource, making it an I/O-bound operation. As soon as
the execution hits await part, the current thread is released and sent back to
the pool (or synchronization context) which can now be utilized by another task
or request. When the result is available, a notification is issued so that that
the thread can continue the remaining execution.
In a deeper level, the I/O
operation is offloaded to the OS and IOCP (I/O Completion Ports) is used for
network I/O on Windows (for other OSs there are options like epoll and kqueue).
When the operation is completed, the OS notifies .NET about the completion via
IOCP, which is essentially a part of thread pool. An I/O thread from the pool
is briefly used for notifying the completion. Hence, an async operation is
completed without the need for creation of additional threads.
Parallelism with TPL
Asynchronous programming is about
dealing with non-blocking execution and optimal utilization of current thread.
But there are certain scenarios where we need more computational resources
executing in parallel instead of merely waiting for them to complete. This is
where TPL (Task Parallel Library) comes to play, allowing us to use all the
available processors efficiently via multithreading. The Task Parallel Library
(TPL) is a set of public types and APIs in the .NET Framework that makes
parallel programming easier. It provides a higher-level abstraction for
multithreading and parallelism, allowing developers to efficiently manage
parallel tasks with minimal effort.
Let’s consider a scenario where
we are processing a large volume of documents. The processing steps comprises
of Spelling Check, Plagiarism, Formatting and Summary Generation. Each step
itself is computationally expensive operation but independent of the others. In
an efficient application, instead of processing each document sequentially, we
can parallelize the process and speedup the overall workflow.
public class Program
{
public static void Main(string[] args)
{
List<string> documents =
new List<string> { "Doc1", "Doc2", "Doc3", "Doc4", "Doc5" };
Console.WriteLine("Document
Processing Started...");
Parallel.ForEach(documents,
document =>
{
ProcessDocument(document);
});
Console.WriteLine("\nAll
documents processed successfully!");
}
static void
ProcessDocument(string document)
{
Parallel.Invoke(
()
=> CheckSpelling(document),
()
=> DetectPlagiarism(document),
()
=> AdjustFormatting(document),
()
=> GenerateSummary(document)
);
Console.WriteLine($"{document}
processing completed.\n");
}
static void CheckSpelling(string doc)
{
Console.WriteLine($"{doc}: Spelling
& Grammar Checked.");
Task.Delay(1000).Wait();
}
static void
DetectPlagiarism(string doc)
{
Console.WriteLine($"{doc}:
Plagiarism Detection Completed.");
Task.Delay(1500).Wait();
}
static void
AdjustFormatting(string doc)
{
Console.WriteLine($"{doc}:
Formatting Adjusted.");
Task.Delay(1200).Wait();
}
static void
GenerateSummary(string doc)
{
Console.WriteLine($"{doc}: Summary
Generated.");
Task.Delay(800).Wait();
}
}
In the above example, the line
Parallel.ForEach initiates the processing of multiple documents concurrently.
Again, the line Parallel.Invoke runs the processing steps simultaneously,
resulting in faster processing.
At first glance, this approach
might seem complex due to the large number of threads being created. However,
TPL simplifies execution by abstracting away the complexities of work
partitioning, thread scheduling, state management, and other low-level details,
so the user doesn't have to manage them directly.
Final Thoughts
There are certain things to
consider before using async/await or TPL. As a starting point, we can ask these
questions.
- Is there any I/O bound operation? Yes => async/await
- Is there any CPU bound operation?
- Are you worried for responsiveness of the application? Yes => async/await
- Is the task appropriate for Parallelism (e.g. multiple independent tasks)? Yes => TPL
The important thing is to analyze
and look for the execution of the code to determine if actually there is a need
for Asynchronous or Parallel execution. Sometimes, the overhead of these
techniques may not lead to desired efficiency as there is lot going under the
hood.
Comments
Post a Comment