A Beginner’s Guide to Databases

Databases

By Gus Lopez

All around the globe, people are constantly tweeting, Googling, booking airline tickets, and banking online, among hundreds of other everyday internet activities. Each of these actions creates pieces of data — and all of these have to live somewhere. That’s where databases — put simply, a collection of data — come in.

Let’s look at LinkedIn as an example of how databases are used. When you first sign up for an account, you create a username and password. These are typically stored in some sort of database — usually one that’s encrypted in order to protect users’ privacy.

Once you’ve created an account, you can start updating your profile, sharing links to articles, and commenting on connections’ posts.

Here’s what happens when you interact with LinkedIn.

These links and comments eventually end up in a database. The main idea is that anyone with the proper permissions can then manage (search, see, comment, share, or like) these elements. All of these actions are usually performed by a piece of software that manages the database.

How Does a Database Work?

Let’s start by focusing on the first part of the word “database”: data. “Data” refers to some unstructured collection of known information.

For example, take a LinkedIn user named Joe whose email address is joe@someemail.com. Right now, we know two things about him: his name and his email address. These are two pieces of data.

Next, we need to organize related pieces of data. This is usually done through a structured format, such as a table. A table is composed of columns (also known as fields) and rows (also referred to as records).

Below, we see that our Joe data are now organized in a table called “Person”. Here, we have a record of Joe’s information: His name is in one field, his email is in another field, and we assign Joe a number (in a third field) for easy reference.

Person    
person_number first_name email
1 Joe joe@someemail.com

As you might expect, in any database there can be many tables — one per related data collection. Simplifying our LinkedIn example, we might have a “Person” table, an “Education” table, and a “Comment” table as we collect more data points about an user and their activities.

Now, these tables can (optionally) be linked together to form some sort of relationship between them. For example, Joe may have listed the schools he attended, which could be represented by a relationship between the “Person” and “Education” tables. Thanks to this relationship, we know which schools in the “Education” table are Joe’s.

Usually, this step is when pieces of structured and related data are translated into information.

Any organization can have multiple databases — one for sales information, one for payroll information, and so on. To maintain these, they often turn to a type of software known as a database management system, or DBMS. There are many types of DBMS to choose from, including Oracle, Microsoft SQL Server, MySQL, and Postgres.

The database itself is housed in a piece of hardware — a physical machine that either resides on a company’s premises or is rented offsite through providers like Amazon Web Services, Google Cloud Platform, or Microsoft Azure Solution.

Last but not least, the data contained in the database needs to be accessible through some sort of admin tool or programming language. Analysts typically use a set of digital tools — including Microsoft Excel, IBM Cognos Analytics, pgAdmin, the R language, and Tableau — to examine this data for patterns and trends.

Data analysts can then use these patterns and trends to make informed decisions.

For example, if you’re a data analyst at a large company, you may be tasked with helping management determine a price for a new product. One approach you could take is looking at how much the product costs to produce — how much of people’s time and effort, as well as machinery, is needed to make and maintain the product. Let’s say you do that by analyzing the data sets of payroll and procurement and come up with a cost of $30. Then you’ll look at how much customers are willing to pay, and perhaps another data set can inform you that similar companies have charged up to $50 for a similar product.

But you can also see that the price might have a seasonal trend, meaning people buy more of this product in, say, December, than during the rest of the year. A data analyst could use any of the above-mentioned data tools to visualize these three data sets — production cost, competitors’ costs, and seasonal purchasing trends — to recommend to that the best price for the new product is $40.

When and Why Are Databases Used?

There are different types of databases and management solutions for different types of problems. Here are just a few reasons why you may need a database and what solutions you may choose in each situation:

  • Storing, processing, and searching large amounts of information: If you’re working for a company like Facebook that manages half a million comments every minute, a database could be used as a place to source reporting/analytics or run machine learning algorithms. The solution may be some sort of distributed data storage and processing framework, like Apache Hadoop or Spark.
  • Building a mobile app: If you’re creating an app, you’ll want to choose a database that is small and efficient for mobile devices, like SQLite or Couchbase.
  • Working at a startup or medium-sized business: If you’re on a tight budget or want a database that’s widely documented and used, then look to open-source database management systems like MySQL, PostgreSQL, MongoDB, or MariaDB.

Databases at General Assembly

At General Assembly, we empower students with the data tools and techniques needed to use data to answer business questions. In our full- and part-time data courses, students use databases to perform data analysis using real-world data.

In our part-time on-campus Data Analytics course or online Data Analysis program, students learn the fundamentals of data analysis and leverage data tools like Excel, PostgreSQL, pgAdmin, SQL, and Tableau. In our part-time Data Science course, students discover different types of databases, learn how to pull data from them, and more, and in the career-changing Data Science Immersive, they gather, store, and organize data with SQL, Git, and UNIX, while learning the skills to launch a career in data.

Meet Our Expert

Gus Lopez is a tech lead with more than 20 years of experience in delivering IT projects around the world, including to many Fortune 500 companies. He teaches the part-time Data Analytics course at General Assembly’s Melbourne campus. Achievements include constantly delivering technically challenging back-end, front-end, and data science projects as well as managing multidisciplinary teams. Gus has a master’s degree in computer science from RICE University, an MBA from Melbourne Business School, and a Ph.D. with summa cum laude distinction in data science from Universidad Central de Venezuela. He is passionate about analyzing and searching for insights from data to improve processes and create competitive advantage for organizations.

“Staying ahead of the curve is one reason I’m passionate about teaching and applying data analytics to every project I work on. Companies that constantly use data for decision-making are pulling ahead of those that aren’t.”

Gus Lopez, Data Analytics Instructor, General Assembly Melbourne