AI and data-driven tools are reshaping industries at breakneck speed, unlocking new ways to generate content, analyze trends, and reach audiences like never before. But as we ride this wave of innovation, we’re also facing some big ethical questions—like how to keep these systems fair, transparent, and valuable for everyone.
In this discussion, Singapore-based AI ethics expert Germayne Ng, Head of Data Science Maps at GoTo Group, and London-based James Phoenix, Co-Author and Lead Developer at O’Reilly, explore the trends reshaping AI and data governance today—and the ethical guardrails necessary to secure fairness and trust in these systems.
Watch the recording here. Or read on to learn:
- How sample and measurement biases can compromise fairness in data systems
- Key differences in data privacy and ethics regulations across the UK, EU, and Southeast Asia
- How increased regulation and algorithmic transparency will redefine industry standards
Fairness in data systems: The role of sample biases and measurement biases
Germayne: When it comes to biases in systems, especially AI-driven ones, you can attribute them to two points: sample biases and measurement biases. I’ll explain each one:
- Sample bias: This means that the data that you’re using might not completely or accurately represent a population. For example, a data set to develop a map system may underrepresent certain vehicle types or overrepresent more popular and high-traffic districts in a city. To fix this, you have to make sure you can, first, identify the bias and, second, take the necessary precautions to ensure there is enough representation for each category. The good thing is that, in many cases, we can overcome some of these biases through data volumes—in other words: there is enough data that things even out.
- Measurement bias: It’s basically inherent noise in the data. To use the map example again, some of the GPS data you’re using may not be accurate or it may be outdated. The key to addressing measurement biases is proper error handling procedures (e.g., removing outliers), proper validation, and testing to ensure that performance is being assessed at a granular level, and continuous and periodic updates of the underlying data. This last one is crucial in a world that is dynamic and ever-changing. A high-performing system needs up-to-date data and robust feedback loops—for example, customer feedback in the case of the map—are a key part of the maintenance of any model.
Balancing innovation and ethics: Ensuring responsible data use
James: It’s important that, beyond showing people innovative techniques such as developing a marketing simulation tool, you set up guardrails to make sure those techniques don’t get abused. And, alongside discussions on how to leverage these techniques more effectively and reliably, we also need a conversation about doing it thoughtfully and responsibly. Because, whether they’re on Google, X, or TikTok, people care about value and quality. And if you can achieve that with automation, then you’re gold. So, one of the most important things I discuss with my students is how we can leverage all this data to create more value for the end user.
Using marketing as an example, there are two areas where things have gone a bit too far, and people are trying to find ways to restore balance:
- AI-generated content: A good example of a gray ethics area is people using AI-generated content to spam Google’s index. In theory, that sounds great—you create content quickly and easily and rank higher on the search results page. But, in practice, a lot of this content provides little value and fills up the web with empty information.
- Web scraping: Another example in marketing is companies scraping publicly available information—such as LinkedIn profiles—for email data. And this isn’t something that’s done in secret. It’s done with the help of well-known, reputable tools like CrunchBase and hunter.io. So the question is, where exactly does this become unethical since the information is publicly available? And, in my point of view, if you’re not contacting these people for a valid reason and creating some value for them, then that’s a bad black hat tactic.
Germayne: Having access to large-scale data sets is a true luxury. It enables us to build intelligent systems powered by machine learning algorithms, recommendation engines, etc. But with that comes a responsibility—to have proper guardrails in place in order to limit access to that data—especially when it contains personally identifiable information.
Data privacy regulations across borders: Key differences between the UK, EU, and Southeast Asia
Germayne: Part of the challenge in operating across Southeast Asia is that regulations vary from one region to another. So, organizations may encounter stark variances in the maturity and consistency of regulations across the region. In addition to making compliance more difficult, working with a patchwork of regulations also makes data storage and usage difficult. For example, you may have to store data differently depending on the sources and regions where it came from. And that can add a lot of complexity when building intelligence systems or complex machine learning systems across regions.
Another difference organizations and individuals working in Singapore, for example, will encounter is that our approach to governance is less prescriptive than in other parts of the world, such as the EU. There’s definitely a bigger focus on promoting the innovation and economic development of AI technologies—although there is some overlap with the European Union in key principles such as transparency, human centricity, harm reduction, and explainability (ensuring that people can understand how these systems work and how their decisions affect them).
James: In the UK and the European Union, where policies are based on GDPR, regulation is definitely more strict and centered around individual rights and consent. For example, Meta was recently unable to release its Llama model in the European Union because it’s not AI Act-compliant. Singapore’s model is much more innovation-friendly, laying out a framework that organizations can use as guidelines when developing and applying AI systems rather than strict regulations.
Germayne: Right now, organizations and governments are walking a very fine line between regulation and innovation. And I think the best way to achieve a balance is to have regulatory bodies work together with organizations. Otherwise, you get cases similar to what James mentioned, where Meta or Apple are delaying or entirely forgoing launching in the EU because they don’t want to deal with an uncertain or overbearing regulatory landscape.
The future of data ethics: Stricter regulation and greater transparency
James: I see two major trends. The first one is increased AI regulation and auditing. While there’s a degree of safety already baked into LLMs—for example, ChatGPT will not generate explicit images or content—we should expect those safeguards to become even stricter as governments continue to shape AI regulation. Expect a bit of a cat-and-mouse game as regulators try to figure out ethics where regulations should be applied and where they’re not necessary. The second and more exciting trend is increased algorithmic transparency. As more companies open-source their algorithms, people will get a better look into their inner workings: what are they doing, how are they doing it, and what results are they generating?
Germayne: I think that AI’s fast pace of development will drive regulators to revise their definitions of questions such as what is ethical and what constitutes privacy rights. Organizations, on their part, will need to develop proper data governance frameworks and dedicated data governance teams—especially as they collect more data footprints and touchpoints from consumers. Slack is a good example of this. They recently created a policy where they notified users that they were going to use their data to train their LLM unless they opted out. There must be transparency and there must be some kind of statement to notify users where and how their data is being used to make sure people don’t lose trust in these systems.
Make it real
At General Assembly, we deliver the goods to keep you ahead of the curve. Our students are thinkers and doers who don’t wait for some imagined future, but build their skills (AI, data, and more) to contribute to the future they want and need. What we offer is a charge-up from the inside out—so change never stops you in your tracks.
Explore our course catalog and move forward with real skills.