We're looking for a junior crawler engineer who wants to develop their Python programming skills while they help us collect and organize data about political office holders, international sanctions, export controls, and other types of government enforcement linked to economic misconduct.
OpenSanctions helps to keep people and companies accountable for their political and economic actions. We build a database that tracks a wide range of entities in the public interest: sanctioned companies, politicians, fraudsters and criminals. Originally built to support anti-corruption journalists, OpenSanctions has also become a powerful tool used for customer screening, anti-money laundering compliance and in-depth investigative research.
We take pride in providing a high quality dataset to the public and to our subscribers. Based on an open source data pipeline and providing a public search tool for everybody, we bring transparency and a commitment to accessibility and openness to the industry.
You will help us to expand our coverage of people in political office and other positions of influence, sanctioned entities, and entities linked to other sources of risk. To do so, you will develop crawler programs that retrieve and structure data from international government websites. You will also be asked to fix and improve existing crawlers whenever they break, and when we decide refine our data model.
This role is ideal for early-career candidates who want to develop their programming skills from basic to intermediate. Writing crawler code (especially with the assistance of automated co-pilots) is a great way to understand coding best practices, web technology and data modeling. We expect candidates to have a keen interest in political systems, international affairs or the investigation of financial crime.
An ideal candidate has some basic understanding of some of these technical workflows:
We expect all candidates to meet these criteria:
We build crawlers using tooling in our own framework, zavod
, which is carefully designed for the nature of our data pipeline and our scaling needs. Each crawler can focus on what is unique about the data source it is designed to integrate into our database.
We maintain a backlog of tasks for data crawling. These tasks comprise either writing a new crawler for a data source we don’t track yet, or updating an existing crawler if it is broken or incorrect. There's a list of issues with existing crawlers that require fixing.
If you’re interested in this position, please send your CV and/or GitHub/GitLab profile: jobs@opensanctions.org
We value diversity of all kinds in our team, and encourage applications from people of all ethnicities, ages, gender identities, and sexual orientations. Even if you feel like you don’t have all the ‘nice-to-haves’, please do apply.