Let’s start by focusing on the first part of the word “database”: data. “Data” refers to some unstructured collection of known information.
For example, take a LinkedIn user named Joe whose email address is email@example.com. Right now, we know two things about him: his name and his email address. These are two pieces of data.
Next, we need to organize related pieces of data. This is usually done through a structured format, such as a table. A table is composed of columns (also known as fields) and rows (also referred to as records).
Below, we see that our Joe data are now organized in a table called “Person”. Here, we have a record of Joe’s information: His name is in one field, his email is in another field, and we assign Joe a number (in a third field) for easy reference.
As you might expect, in any database there can be many tables — one per related data collection. Simplifying our LinkedIn example, we might have a “Person” table, an “Education” table, and a “Comment” table as we collect more data points about an user and their activities.
Now, these tables can (optionally) be linked together to form some sort of relationship between them. For example, Joe may have listed the schools he attended, which could be represented by a relationship between the “Person” and “Education” tables. Thanks to this relationship, we know which schools in the “Education” table are Joe’s.
Usually, this step is when pieces of structured and related data are translated into information.
Any organization can have multiple databases — one for sales information, one for payroll information, and so on. To maintain these, they often turn to a type of software known as a database management system, or DBMS. There are many types of DBMS to choose from, including Oracle, Microsoft SQL Server, MySQL, and Postgres.
The database itself is housed in a piece of hardware — a physical machine that either resides on a company’s premises or is rented offsite through providers like Amazon Web Services, Google Cloud Platform, or Microsoft Azure Solution.
Last but not least, the data contained in the database needs to be accessible through some sort of admin tool or programming language. Analysts typically use a set of digital tools — including Microsoft Excel, IBM Cognos Analytics, pgAdmin, the R language, and Tableau — to examine this data for patterns and trends.
Data analysts can then use these patterns and trends to make informed decisions.
For example, if you’re a data analyst at a large company, you may be tasked with helping management determine a price for a new product. One approach you could take is looking at how much the product costs to produce — how much of people’s time and effort, as well as machinery, is needed to make and maintain the product. Let’s say you do that by analyzing the data sets of payroll and procurement and come up with a cost of $30. Then you’ll look at how much customers are willing to pay, and perhaps another data set can inform you that similar companies have charged up to $50 for a similar product.
But you can also see that the price might have a seasonal trend, meaning people buy more of this product in, say, December, than during the rest of the year. A data analyst could use any of the above-mentioned data tools to visualize these three data sets — production cost, competitors’ costs, and seasonal purchasing trends — to recommend to that the best price for the new product is $40.