Friday, 17 September 2021

Beyond common BIM data

The last time I wrote a blog, I was experimenting with the Pandas framework in Python while exploring the use of (open) data. However, the example I was working on at the time had little to do with my work as a structural engineer. So I thought it high time to look for a topic where I could experiment with data analysis within my field of work. Only, which data is suitable to get started with?

(image source: https://www.stitchdata.com/blog/best-practices-for-data-modeling/)

In the past year, through my appointment at TU Delft, I have been able to guide a number of students with their graduation projects. An increasingly popular subject in these theses in recent years is the aspect of sustainability, a subject with many, many facets. Students have done and are doing research on flexibility and adaptivity of buildings and structures, demountability, reuse of components from existing buildings, etc. And, of course, the impact that the use of raw materials has on the environment.

Because the latter was relatively unknown territory for me, I thought it would be good to dive into the subject more, and particular the areas where a structural engineer can make a difference: material use. Moreover, the subject turned out to be very suited to combine with (BIM) data analyses. That’s how my next project ended up being a proof of concept for automation of a CO2 footprint calculation for my projects!

The Base: Python, Pandas, en Revit schedules

For this exercise I chose to use exported Revit schedules for the required building information, and Python with the Pandas framework for processing and editing this data. Of course, it’s also possible to do this in Revit itself, by hooking up to the Revit API, or by using visual programming in Dynamo for example.

Screenshot of a Floor schedule, as exported from Revit to Excel

However, I believe that the power of schedules is underestimated a lot. "Big BIM", "Common Data Environments" and "Digital Twins" are fashionable concepts, often used for teasing data oriented articles in the industry, but practice more often than not is not that sofisticated, for a variety of reasons. I ask you, which engineering firm always enriches its BIM models with (3D) reinforcement, the governing loads on elements, or the environmental impact of each separate element?

Data exported to Excel however is the complete opposite. A cost consultant can use this as a basis for the calculation of building cost, a sustainability consultant can allocate the volumes of materials for certain BREEAM calculations, and a structural engineer can use the extracted geometry to automate the calculation of the critical steel temperature of steel elements in a project. And all this can be done without having to use specific (pricy!) software or have experience with the software in which the project is modelled.

The above argument is also valid in my opinion for the implementation of Python: it is free and can therefore be used by everyone! In addition, it in itself is a very powerful tool, and for a programming language relatively easy to learn. It is not for nothing that the use of Python increases significantly with complanies as well as educational institutions like TU Delft...
 

Back to the environmental impact...

In order to be able to perform a good calculation, the information from the Revit model (geometry, quantities, materialization) needs to be combined with information on the environmental impact of parts and materials. Unfortunately, the database of the Dutch Foundation “Stichting Nationale Milieudatabase” turned out to be not open source (missed opportunity!), so I chose to feed the necessary information for this proof of concept from a separate Excel file.

Currently, in my proof of concept, the added parameters are specific weight and CO2 footprint of the (generic!) materials used in Revit, and the global reinforcement quantities for concrete structures, as often determined in the initial phases of the design process. Of course, this information can be further expanded on, for example with intended compositions of concrete mixtures, type of fire-resistant cladding for steel structures, etc.

Additional tables, to be combined with the Revit schedules

Python and Pandas at work

I won't go into the actual Python code this time. A link to the proof of concept script can be found at the bottom of this blog as usual. This time I want to reflect on the power of Python and Pandas in relation to data processing.

Pandas as it turns out supports data processing principles that have been applied for decades already in database languages / software such as (My /MS) SQL and Microsoft Access. It is for example possible to combine tables with each other based on so-called (unique) “IDs” or "keys".

(image source: https://www.dofactory.com/sql/join)

In this proof  of concept, two different combinations have been made to arrive at the intended output:

  • The material data (Specific gravity, CO2 footprint) is linked on the basis of 1 parameter: (Structural)  Material;
  • The data relating to the concrete structures in the model is linked on the basis of 2 parameters: Family (element type) and (Structural) Material. For example, 
    • a distinction can be made between cast in-situ walls and floors, which are usually not reinforced  the same, or have the same concrete mixture composition, and therefore do not have the same contribution to the environmental impact;
    • the same goes for prefabricated concrete, which usually has a larger CO2 footprint due to the higher concrete strength and/or necessity for a faster hardening process (in the factory).

The concept of this table merge for this proof of concept is sketched in the image below:


The result is a table with the elements as exported from Revit, supplemented with the data for (determining) the CO2 footprint per element:

Screenshot of the Floor schedule, stripped and enriched with the CO2 footprint data

This data can then be easily exported to Excel again using Pandas, after which this data can be further used for calculations, or, for example, visualizations of the data using Power BI:

Data visualisation in Microsoft's PowerBI

This way you can see that with little effort you can still significantly enrich the data from a BIM model.  And the possibilities are very extensive: similar  workflows can also – and with the same ease – be set up for e.g. cost calculations, BREEAM (WST) calculations, etc. etc. And all this without you having to be a (BIM software) data expert to fill and manage the BIM model!

Have a look at the script via the Gist link below. And of course all your comments or questions are welcome!
https://gist.github.com/marcoschuurman/cea6fff469430e94f0ec42ebc3c1fc61


Tuesday, 1 September 2020

Working with Open Data

A short post this time around, with loads of graphs and a very current and relevant topic, but with absolutely nothing to do with Structural Engineering. 

I have been thinking of doing something with open data kinda for a while now, and got triggered again by all the discussions about covid.


Online data

It didn't even require a lot of research, as it happens. Searching for "covid open data" got me a link to a (Dutch) government website with an overview of links to publicly available data sources concerning covid data. From there it was just a few clicks to the European Center for Disease Control (ECDC), which keeps a day-to-day overview of covid related data.

Sidenote: in this search I stumbled on the website Our World in Data. They have interesting data stories on all kinds of topics, of  which https://ourworldindata.org/coronavirus and https://ourworldindata.org/coronavirus-testing are just two - covid related - examples with some very cool infographics! Don't expect me to produce similar kinds of graphs in this blog (yet)! ;-)

Retrieving the data

For the Python scripting I'm using Jupyterlab again, just like in my previous (Structural Engineering related) blog. Retrieving the data itself requires very little effort, just some importing of data packages and reading a URL - from an online CSV data file to a Pandas dataframe:


As you can see, it's easy as pie to get to this data. Of course, it will need so work to be presentable :-)

Converting the data

The data types of the different columns are not usable yet, and some columns are not required (in my case). The dateRep column is converted to an actual Date/Time data type, some columns are dropped (using an array, since that'll keep the script kinda flexible) and others are renamed for ease-of-use (who came up with the original column names?):



As you can see in the bottom part of the screenshot above, this results in a nice little table with data to work with. Lastly, to be able to display results per country, I've grouped it by country:



Plotting the data

Finally we get to plot the data. I wrote the plot data in such a way, that I can easily show my own collection of data. This is done by means of the "data_plot", "c_plot" and "emphasize" parameters in the code below:


Now I can, for instance, plot the 14 day cumulative number of COVID cases for the Netherlands, Belgium, France and Spain, with an emphasis (bold line) for Netherlands and Belgium:


Or we could calculate our own 7 day and 14 day rolling average daily confirmed cases for some of the countries that were hit hardest by the current pandemic (ignoring the last few data steps, I still need to study a bit on the rolling functions for Pandas I think):



As you can see, the combination of Python (in Jupyterlab) and open data is amazing! Almost real-time (open) data at your disposal with just a few lines of code.

Not sure how tot use it in my daily structural engineering work, but we'll find a good use for it! :-) If you have good suggestions, feel free to post them below.

FYI, some other interesting open data links I came across:

Link to my Github gist for this script:

Monday, 1 June 2020

Python for number crunching

We've all been there: building an extensive 3D FEM model for a project, tens of thousands of beams/columns, everything seems to work nicely... and then you need to start crunching the numbers to see for instance if all the concrete penants and lintels can be reinforced. Or even more labour-intensive, for all those 1D elements the reinforcement needs to actually be designed and detailed. Where to start, right?

The result

This is what we we like to get: all sections of all 1D elements from a specific section type, checked against a given reinforcement layout in the section:


Now, how to get there?

Python & Pandas

I've got one such project at hand, and I was thinking on how to make this as easy as possible for myself. Since I have been playing with Python I have come across the Pandas framework for Python, but I hadn't really used it yet, other than to easily layout tables. The proof of the pudding is in the eating, so I thought of using this weekend to try this out some more.

I know SCIA is working on an API for their FEM package SCIA Engineer, but for this case I went the old-fashioned way: I exported the 1D element information to Excel. The result: approximately 3400 1D elements, with a total of about 25.500 section force combinations:


As a sidenote: to make things easy for yourself, choose to have separate tables exported to separate worksheets ;-) Otherwise you'll have all data in one sheet, which will make life much more difficult.

For implementation of Python and Pandas, I'm still using Jupyterlab. Pandas is included as standard in Jupyterlab's impementation in Anaconda, so I just have to put the Excel file in a separate folder within the Jupyterlab notebook folder, and we can get started on the number crunching!

Reading and preparing

Importing the Excel to Python with Pandas is actually quite easy:


As you can see in the screenshot above, it already distinguished between the 2 worksheets in the Excel document.

Next we parse both sheets to their respective datasheets:


The "Staven" (1D element) sheet doesn't require a lot of additional processing, since the exported data is already very lean; the .dropna() command is only there just in case. The "Interne 1D-krachten" (Internal 1D forces) sheet has some polution of the table: the first few lines are skipped, and rows with non-matching values (resulting in NaN values on their rows) are removed using the .dropna() command. The resulting tables already look quite clean and usable! (The .head(5) command is used to show just the first 5 elements of the dataframes)

Last but not least, the results of both tables are combined and grouped based on the section ("Doorsnede"). For those of you not familiar with SQL-esque database systems: using the "how='outer'" join type for combining these dataframes ensures that every row with a result of a particular 1D element gets the right name, section, etc. assigned:


Interesting bit: the tables are imported with all columns as string types at first. By running the .to_numeric(<...>, errors=''ignore) function, the records that can be converting to numbers, will be.

Drilling down on results

After this and some matplotlib magic with some for-loops, the section forces for all 1D elements in the model can be visualised, grouped and colored by section type:



This is still not very usable though; it would be better to (1) have the section forces grouped in graphs per section type, and (b) visualise them in relation to the section capacity.

First, we select the section to analyse:



Then we determine the section capacity. Fortunately, I made a script in Python before to calculate the capacity at a given concrete section design, see this previous blog again for some more info. The script can be turned into a function to be re-used with little effort (MNCapCurve() in the screenshot below), which makes it possible to automate the capacity calculation with some extra input for the concrete cross section to calculate:


You can use some code here to set the parameters for plotting the graphs, like using the same axes scales for Y- and Z-direction of the sections, or collecting the 1D elements with the governing section forces for annotion in the graphs. 

And finally you can plot the section forces from the FEM model against the section capacity, to be used to check if your model is working right, if the (element) design is feasible, or even to produce an automated detailed design on these elements:


That's it! I still have to change the "elem_nr" manually to generate the capacity check for the next section. Generating capacity checks for all sections included in the FEM model is definitely next on the ToDo list!

I hope you enjoyed this little exercise in data crunching and automation with FEM output. Let me know what you think in the comments!