Geophysics Python sprint 2018 – day 2 and beyond, part II

In the last post I wrote about what Volodymyr and I worked on during a good portion of day two of the sprint in October, and continued to work on upon our return to Calgary.

In addition to that I also continued to work on a notebook example, started in day one, demonstrating on how to upscale sonic and density logs from more than one log at a time using Bruges ‘ backusand Panda’s groupby. This will be the focus of a future post.

The final thing I did was to write, and test an error_flag function for Bruges. The function calculates the difference between a predicted and a real curve; it flags errors in prediction if the difference between the curves exceeds a user-defined distance (in standard deviation units) from the mean difference. Another option available is to check whether the curves have opposite slopes (for example one increasing, the other decreasing within a specific interval). The result is a binary error log that can then be used to generate QC plots, to evaluate the performance of the prediction processes in a more (it is my hope) insightful way.

The inspiration for this stems from a discussion over coffee I had 5 or 6 years ago with Glenn Larson, a Geophysicist at Devon Energy, about the limitations of (and alternatives to) using a single global score when evaluating the result of seismic inversion against wireline well logs (the ground truth). I’d been holding that in the back of my mind for years, then finally got to it last Fall.

flag_full

Summary statistics can also be calculated by stratigraphic unit, as demonstrated in the accompanying Jupyter Notebook.

What is acquisition footprint noise in seismic data?

Acquisition footprint is a noise field that appears on 3D seismic amplitude slices or horizons as an interwoven linear crosshatching parallel to the source line and receiver line directions. It is for the most part an expression of inadequate acquisition geometry, resulting in insufficient sampling of the seismic wave field (aliasing) and irregularities in the offset and azimuth distribution, particularly in the cross line direction.

Sometimes source-generated noise and incorrect processing (for example residual NMO due to erroneous velocity picks, incomplete migration, or other systematic errors) can accentuate the footprint.

This noise can interfere with the mapping of stratigraphic features and fault patterns, posing a challenge to seismic interpreters working in both exploration and development settings.

To demonstrate the relevance of the phenomenon I show below a gallery of examples from the literature of severe footprint in land data: an amplitude time slice (Figure 1a) and a vertical section (Figure 1b) from a Saudi Arabian case study, some seismic attributes (Figures 2, 3, 4, and 5), and also some modeled streamer data (Figure 6).

Bannagi combo

Figure 1. Amplitude time slice (top, time = 0.44 s) showing footprint in both inline and crossline direction, and amplitude section (bottom) highlighting the effect in the vertical direction. From Al-Bannagi et al. Copyrighted material.

Penobscop_sobel

Figure 2. Edge detection (Sobel filter) on the Penobscot 3D horizon (average time ~= 0.98 s) displaying N-S footprint. From Hall.

F3_shallow_sobel

Figure 3. Edge detection (Sobel filter) on a shallow horizon (average time ~= 0.44 s)  from the F3 Netherlands 3D survey displaying E-W footprint.

Davogustto and Marfurt

Figure 4. Similarity attribute (top , time = 0.6 s), and most positive curvature (bottom, time = 1.3 s), both showing footprint. From Davogustto and Marfurt. Copyrighted material.

Chopra-Larsen

Figure 5. Amplitude time slice (top, time = 1.32 s) the corresponding  coherence section  (bottom) both showing footprint. From Chopra and Larsen. Copyrighted material.

Long et al

Figure 6. Acquisition footprint in the form of low fold striation due to dip streamer acquisition. From Long et al. Copyrighted material.

In my next post I will review (with more examples form literature) some strategies available to either prevent or minimize the footprint with better acquisition parameters and modeling of the stack response; I will also discuss some ways the footprint can be attenuated after the acquisition of the data (with bin regularization/interpolation, dip-steered median filters, and kx ky filters, from simple low-pass to more sophisticated ones) when the above mentioned strategies are not available, due to time/cost constraint or because the interpreter is working with legacy data.

In subsequent posts I will illustrate a workflow to model synthetic acquisition footprint using Python, and how to automatically remove it in the Fourier domain with frequency filters, and then how to remove it from real data.

References

Al-Bannagi et al. 2005 – Acquisition footprint suppression via the truncated SVD technique: Case studies from Saudi Arabia: The Leading Edge, SEG, 24, 832– 834.

Chopra and Larsen,  2000 – Acquisition Footprint, Its Detection and Removal: CSEG Recorder, 25 (8).

Davogusto and Martfurt, 2011 – Footprint Suppression Applied to Legacy Seismic Data Volumes: 31st Annual GCSSEPM Foundation Bob F Perkins Research Conference 2011.

F3 Netherlands open access 3D:  info on SEG Wiki

Hall, 2014 –  Sobel filtering horizons (open source Jupyter Notebook on GitHub).

Long et al., 2004 – On the issue of strike or dip streamer shooting for 3D multi-streamer acquisition: Exploration Geophysics, 35(2), 105-110.

Penobscot open access 3D:  info on SEG Wiki

Mapping and validating geophysical lineaments with Python

In Visualization tips for geoscientists: MATLAB, Part III I showed there’s a qualitative correlation between occurrences of antimony mineralization in the southern Tuscany mining district and the distance from lineaments derived from the total horizontal derivative (also called maximum horizontal gradient).

Let’s take a look at the it below (distance from lineaments increases as the color goes from blue to green, yellow, and then red).

regio_distance_mineral_occurrences

However, in a different map in the same post I showed that lineaments derived using the maxima of the hyperbolic tilt angle (Cooper and Cowan, 2006, Enhancing potential field data using filters based on the local phase) are offset systematically from those derived using the total horizontal derivative.

Let’s take a look at the it below: in this case Bouguer gravity values increase as the color goes from blue to green, yellow, and then red; white polygons are basement outcrops.

The lineaments from the total horizontal derivative are in black, those from the maxima of hyperbolic tilt angle are in gray. Which lineaments should be used?

The ideal way to map the location of density contrast edges (as a proxy for geological contacts) would be  to gravity forward models, or even 3D gravity inversion, ideally constrained by all available independent data sources (magnetic or induced-polarization profiles, exploratory drilling data, reflection seismic interpretations, and so on).

The next best approach is to map edges using a number of independent gravity data enhancements, and then only use those that collocate.

Cooper and Cowan (same 2006 paper) demonstrate that no single-edge detector method is a perfect geologic-contact mapper. Citing Pilkington and Keating (2004, Contact mapping from gridded magnetic data – A comparison of techniques) they conclude that the best approach is to use “collocated solutions from different methods providing increased confidence in the reliability of a given contact location”.

I show an example of such a workflow in the image below. In the first column from the left is a map of the residual Bouguer gravity from a smaller area of interest in the southern Tuscany mining district (where measurements were made on a denser station grid). In the second column from the left are the lineaments extracted using three different (and independent) derivative-based data enhancements followed by skeletonization. The same lineaments are superimposed on the original data in the third column from the left. Finally, in the last column, the lineaments are combined into a single collocation map to increase confidence in the edge locations (I applied a mask so as to display edges only where at least two methods collocate).

collocation_teaser

If you want to learn more about this method, please read my note in the Geophysical tutorial column of The Leading Edge, which is available with open access here.
To run the open source Python code, download the iPython/Jupyter Notebook from GitHub.

With this notebook you will be able to:

1) create a full suite of derivative-based enhanced gravity maps;

2) extract and refine lineaments to map edges;

3) create a collocation map.

These technique can be easily adapted to collocate lineaments derived from seismic data, with which the same derivative-based enhancements are showing promising results (Russell and Ribordy, 2014, New edge detection methods for seismic interpretation.)

Why I replaced my Creative Commons license with Konomark

Why I share

I think openness in geoscience is very important, and I feel we all have a duty to be open with our work, data, ideas when possible and practical. I certainly do believe in sharing a good deal of the work I do in my spare time. So much so that when I started this blog there was no doubt in my mind I would include an agreement for people to use and modify freely what I published. Indeed, I venture to say I conceived the blog primarily as a vehicle for sharing.

Some of the reasons for sharing are also selfish (in its best sense): doing so gives me a sense of fulfillment, and pleasure, as Matt Hall writes in Five things I wish I’d known (one of the essays in 52 You Should Know About Geophysics), you can find incredible opportunities for growth in writing, talking, and teaching.  There is also the professional advantage of maintaining visibility in the marketplace, or as Sven Treitel puts it, Publish or perish, industrial style (again in 52 You Should Know About Geophysics).

How I used to share

At the beginning I choose an Attribution-NonCommercial-ShareAlike license (CC BY-NC-SA) but soon removed the non-commercial limitation in favour of  an Attribution-ShareAlike license (CC BY-SA).

A (very) cold shower

Unfortunately, one day last year I ‘woke up’ to an umpleasant surprise: in two days an online magazine had reposted all my content – literally, A to Z! I found this out easily because I received pingback approval requests for each of them (thank you WP!). Quite shocked, I confess, the first thing I did was to check the site: indeed all my posts were there. The published included an attribution with my name at the top of each post but I was not convinced this would be fair use. Quite the contrary, to me this was a clear example of content scraping, and the reason why I say that is because they republished even my Welcome post and my On a short blogging sabbatical post – in the science category! – please see the two screen captures below (I removed their information) of the pingbacks:

About_post_edit

sabbatical_post_edit

If this was a legitimate endeavour, I reasoned, a magazine with thoughtful editing,  I was sure those two posts would have not been republished. Also, I saw that posts from many other blogs were republished en masse daily.

Limitations of Creative Commons licenses

I asked for advice/help from my twitter followers, and on WordPress Forums, while at the same time started doing some research.  That is when I learned this is very common, however being in good company (google returned about 9,310,000 results when searching ‘blog scraping’) did not feel like much consolation: I read that sites may get away with scraping content, or at least try. I will quote directly from the Plagiarism Today article Creative Commons: License to Splog?: “They can scrape an entire feed, offer token attribution to each full post lifted (often linking just to the original post) and rest comfortably knowing that they are within the bounds of the law. After all, they had permission …Though clearly there is a difference between taking and reposting a single work and reposting an entire site, the license offers a blanket protection that covers both behaviors”.

Fight or flight?

Yes, Creative Commons have mechanisms that allow fighting of this abuse, but their effectiveness is yet to be proved (read for example this other article by Plagiarism Today, Using Creative Commons to Stop Scraping). Notice these articles are a bit out of date but as far as I could see things have not improved much. The way is still the hard way of tracking down the culprit and fighting through  legal action, although social media support helps.

It is possible to switch to a more restrictive Creative Commons license like the Attribution-NonCommercial-NoDerivs (perhaps modified as a CC+), but that only allows to cut your losses, not to fight the abuse,  as it is only on a going-forward basis (I read this in an article, and jotted down a note, but  I unfortunately cannot track down the source – you may be luckier, or cleverer).

Then I was contacted by the site administrator through my blog contact form (again I removed their information), who had read my question on the WordPress forum:

Your Name: ______
Your Email Address: ______
Your Website: ______
Message: Hello.
Your site is under a CC license. What’s the trouble in republishing your content?
Regards.
Subject: Your license

Time: Thursday July 26, 2012 at 12:26 am
IP Address: ________
Contact Form URL: http://mycartablog.com/contact/
Sent by an unverified visitor to your site.

I responded with a polite letter, as suggested by @punkish on twitter. I explained why I thought they were exceeding what was warranted under the Creative Commons license, that republishing the About page and Sabbatical posts  was to me proof of scraping, and I threatened to pursue legal recourse, starting with DMCA Notice of Copyright Infringement. Following my email  they removed all my posts from their site, and notified me.

Two alternatives

I think I was fortunate in this case, and decided to take matters into my own hands to prevent it from happening again. Following my research I saw two good, viable ways to better protect my blog from scraping whole content, while continuing to share my work. The first one involved switching to WordPress.org. This would allow more customization of the blog, and use of such tools as the WP RSS footer plugin, which allows to Get credit for scraped posts, and WP DMCA website protection. Another benefit of switching to WordPress.org is that – if you are of belligerent inclination – you can try to actively fight content scraping with cloacking. Currently, although it is one of my goals for the future of this blog, I am not prepared to switch WordPress.com due to time constraints.

Having decided to stick to WordPress.com  the alternative I camu up with was to remove the CC license, replacing it with a Copyright notice with what I would call a liberal attitude.  A simple way to do that was to add the Konomark logo accompanied by a statement that encourages sharing, but without surrendering any rights upfront. Addtionally you can prevent content theft from your WordPress.com (or at least reduce the risk) by configuring the RRS feed so that it displays post summaries only, not full posts.

How I share now

I customized my statement to reduce as much as possible the need for readers to ask for permission by allowing WorPress reblogging and by allowing completely open use of my published code and media.  Below is a screen capture of my statement, which it is located in the blog footer:

I hope this will be helpful for those that may have the same problem. Let me know what you think.

Konomark_footer

Additional reading

Whine Journalism and how to bring the splashback – a great story and a great step-by-step guide, to fight content theft

Content Scrapers – How to Find Out Who is Stealing Your Content & What to Do About It

Copysquare and Konomark

Useful tools to detect stolen content

Copyscape – Search for copies of your page on the web

Google Alerts – (for example read this article)

Google Report scraper pages

TinEye – reverse image search engine. ‘It finds out where an image came from, how it is being used, if modified versions of the image exist…’