Skip to content

13 October, 2020

Getting Data Science/Engineering Projects to Jell with In-House IT

feature_image_cropped

Article By

Johannes Tynes, Lead Data Scientist, Flowtale


Introduction

While we at Flowtale deliver solutions within data science and data engineering to clients across very different industries, all of them have in common the presence of IT, with whom our work often intersects. 

So how do you as a client avoid us “getting in the hair” of IT without impeding the projects you’ve commissioned us to deliver? 

Below we walk through some operating rules, based on our experience, which you may want to consider to help minimize friction. These rules should be considered by whoever is responsible for implementing a project and also presenting it at an early stage to IT. This is likely to be relevant in the planning phase of any such project irrespective of whether they involve the assistance of service providers such as us at Flowtale.

It’s our belief that by following these operating rules, and making that transparent to IT, it is both more likely that your project will succeed and that it will be viewed favorably and embraced by IT from the outset. Please read on if you are curious about what they are and how they may help you contribute to synergies between data science/engineering and IT. 

Suggested Operating Rules in Relation to IT

Make no policy exceptions 

IT policies are vital to ensure the security, integrity and availability of your critical systems. If any exceptions are explicitly or implicitly made this endangers the very outcomes which the projects are supposed to achieve. 

Survey what already exists 

By making sure existing projects, modules, and infrastructure are surveyed before new data science/engineering projects start, design patterns can be identified 

  • which have been found acceptable in the past (precedents) 
  • with which there is already some in-house familiarity 

Bias towards supported tools 

Unless there are good reasons not to, it makes sense to use tools (programming languages, version control systems, cloud, infrastructure) with which you already have some experience internally. This means that efforts can be focused on understanding the project deliverables, and maximizing return on investment, rather than coping with new tools and workflows. 

Document everything, in more than one way 

If everything relevant (methodology, assumptions, results, settings, business context) is documented, this helps with understanding, buy-in, and maintenance / further development. Documentation should ideally be: 

  • both in-line (within code, templates) and in summary form 
  • targeted at both expert and non-expert audiences within your company 

By ensuring you get documentation and summaries / FAQs which is written for an audience without an extensive background in data science/engineering you can in particular ensure the project: 

  • is more widely understood and more easily discussed 
  • achieves greater rates of adaptation 
  • can serve as inspiration for further initiatives 

Syncing with / hosting on internal resources 

If all code, files and project management objects are synced with or hosted on internal tools, then: 

  • the whole project is more transparent 
  • workflows and actions can be tracked 
  • no important data is lost 
  • further integration needs down the line are eliminated 

Explain the internal skill set 

By explaining which skills are present in the current IT/digital team, it is possible to position projects closer to those (through choice of tools and methodology) and therefore make them easier to understand, maintain and develop further. We have often found that simple questionnaires can help clarify what the existing skill set looks like. 

Transparentize processes 

Typically workflows will be based on software development and scientific process elements: 

  • clearly defined tasks 
  • high frequency of standup meetings 
  • everything, always committed to version control (git) 
  • sprints 
  • benchmarking 
  • (controlled) experimentation 

Making this fully transparent to you can help: 

  • alleviate unfounded fears about the direction which projects are taking 
  • enable quick feedback a change of approach is needed (“fail early”) 

Modular deliverables 

If you request that deliverables are made in modular, often decoupled, ways this reduces your integration risk and allows for the deployment of projects in different configurations. This is particularly relevant in an environment of changing IT landscape and varied or unpredictable use cases. This increases the lifetime of project value. 

Continuous integration and deployment 

Require that solutions are delivered with CI/CD in mind, and, if possible, leveraging CI/CD tools already used internally. By consistently and extensively using CI/CD your projects become more stable and require fewer interventions from external consultants. Unless there is a good reason not to, existing CI/CD tools can be used. 

Staff/skill up in parallel 

If your data science or data engineering project is used as an opportunity to add skills, experience or even staff within IT this can reduce maintenance risk and make the process of integration smoother. 

Sandbox / development environments 

By encouraging the use of sandbox or development environments until projects are near production-ready, you get the benefits of: 

  • not having to wait for production environment infrastructure to be approved and created 
  • fewer security and interference concerns in early stages of development 

Infrastructure-as-Code 

If all infrastructure (created for example in the cloud) is created based on explicit recipes (templates), this means it is easier to : 

  • understand what infrastructure creation steps have been taken 
  • further develop it 
  • track changes over time, because the templates are version controlled 
  • change settings and technical parameters, as these are split out into configuration files 

Make Your Next Project a Success Speak with Us at  Flowtale Today

We at Flowtale take non-technical project aspects such as collaboration with in-house very seriously. From delivering challenging data science and engineering projects in diverse industries we have learnt valuable lessons, which you can benefit from to reduce your risk of running into IT collaboration problems.

Please do not hesitate to contact us to discuss how you can make your next data science or data engineering project a success together with us.