Download as pdf
Download as pdf
You are on page 1of 17
2rarea2t LIVEcommunity- MineMel to Extract Indleators From generic API- LIVEcommunity 218757 MineMeld to Extract Indicators From generic API % & & Xhoms 07-07-2018 01:57 Al 100% helpful (212) Introduction Although MineMeld was conceived as a threat sharing platform, reality has shown many users are taking advantage of its open and flexible engine to extract dynamic data (not threat indicators) from generic APIs. * The highly successful case of extracting 0365 dynamic data (IP's, domains and URLs) from its public facing API Endpoint * Many users relying on MineMeld to track the public IP space from cloud and CDN providers like AWS, Azure, CloudFlare, Fastly as a much more robust and scalable alternative to mapping them with FQDN objects. * Oreven people using MineMeld to extract the list of URL's to videos published in specific YouTube playlist or channels via the corresponding Google API. Allthese are examples of MineMeld being used to extract dynamic data from public API's. Depending on the source, a new class (python code) may be needed to implement the client-side logic of the API we're willing to mine. But, in many case, the already available ready-to-consume "generic classes" could be used instead. This way the user could "mine" its generic API without the need to deep dive into the GitHub project contribution. The "generic classes" There are, basically, three "generic classes" that can be reused in many applications: + The HTTPFT class: Create a prototype for this class when you need to extract dynamic data from content delivered in HTML or PlainText (text/plain, text/html) + The SimpleJSON class (I love this one!): Do you need to extract dynamic data from an API that delivers the response as a JSON Document? You're all set with a itpssitve paloaltonotworks.comftSAkbiatleprntnagek-iAMineMeldAticlesiarice-8/290 wr 2rearo024 itpssitve paloaltonotworks.comftSAkbiatleprntnagek-iAMineMeldAticlesiarice-8/290 LivEcommunity~ MineMeld to Extract Indicators From generic API - LIVEcommunity - 218757 protorype of this class! + The CSVFT class: Some services still use variants of CSV (delimiter-based multi- field lines) to deliver its content. The following is the rule of thumb that will let you know if the API you want to extract dynamic data from can be "mined" using MineMeld by providing just a prototype for one of these classess (without providing a single line of code!) 1. The transport must be HTTP/HTTPS 2. None or basic authentication (user + password) 3. Single transaction (one call retrieves the whole indicator list - no pagination) 4. Indicators are provided in plain, html, csv or json format. The following sections in this article will teach you how to use these generic classes to mine an example API that provides real-time temperature for four MineMeld-relateed cities in the world: Format APIURL csv Jes HTML Jes JSON __https://test.minemeld.com/json inemeld.com/esv inemeld.com/htm| Mining a CSV API We will start with CSV because it is, probably, the easiest one between the generic classes. The theory of operations is: + The CSVFT class will perform a HTTPS API Call without (or basic) authentication. The expected result will be table-like document where every line will contain an indicator plus additional attributes separated by a known delimiter. + Before the CSV parser kicks in, a regex pattern will be used to discard lines that should not be processed (i.e. comments) + The prototype will provide configuration elements to the CSV parser to perform the correct field extraction from each line. First of all, lets call the demo csv api and analyze the results: 27 2rearo024 LIVEcommunity- MineMel to Extract Indleators From gener API-LIVEcommunity 218757 Request -> GET /esv HTTP/1.1 Host: test.mineneld.con Response Headers <- HTTP/2.8 200 OK content-type: text/esv content-disposition: attachment; filenane="mineneldtest.csv" content-length: 432 Response Body <- # Real-Time temperature of MineMelc-related cities in the world. url, country, region, city, temperature https: //ajuntanent barcelona. cat/turisme/en/ ,£S,Catalunya,Barcelona,12.24 http: //m.turisno.comune.parma. it/en, IT, Emilia-Ronagna, Parma, 16.03 http: //santaclaraca.gov/visitors,US,California,Santa Clara,8.98 + The API returns a test /csv content and suggests us to store the results as an attachment with the name a csv. + Regarding the content, it looks like 4 data records are provided with up to 5 fields ines that do each: url, country, region, city and temperature. There are, as well, two not provide any value and that should be discarded (the one with the comment and the one withe the field headers) * And, as for the CSV parsing tasl looks like the fields are clearly delimited by the ‘comma character. We're ready to go to configure our prototype to mine this API with the CSVFT class. Step 1: Create a new prototype using any CSVFT-based one as starting point. Weill use the prototype named "sstabusech.ipblacklist" as our starting point. Just, navigate to the config panel, click on the lower right icon (the one with the three lines) to expose the prototype library and click on the sslabusechone. itpssitve paloaltonotworks.com/tSAkbiartleprintnagek>-AMineMoldAiclesiaricle-8/290 snr 2rearo024 LIVEcommunity- MineMel to Extract Indleators From gener API-LIVEcommunity - 218757 Cliking on the ss/abuse prototype will reveal its configur: the following picture, The most important value in the prototype is the class is applies to. In this case, the CSVFT one we want to leverage. Our mission is to create a new prototype and to change its configuration to accomplish our goal to mine the demo CSV API. The following is the set of changes we will introduce: * Name, Description and Tags (to make it searchable in the prototype library) + Inside the CONFIG section: + We will replace urlwith https://test minemeld.com/csv + Weill change the indicator typeto URL and set the confidence level to 100 * Provide our own set of fieldnames + Define the ignore regex pattern as "*(2!https)" (to discard all lines except the ones starting with "https") * Describe the source.nameas minemeld-test ‘Simply click on the NEW button and modify the prototype as shown in the following picture. itpssitve paloaltonotworks.comftSAkbiatleprntnagek-iAMineMeldAticlesiarice-8/290 ann 2irain0n LIVEcommunity- MineMel to Extract Indicators From generic API- LIVEcommunity - 218757 Please, take a closer look to the fieldname list and realize the first name in our prototype list to be “indicator” (in the CSV body the first field was suggested to be “url” instead). The CSV engine inside the CSVFT class will extract all comma separated values from each line and use the one matching the column named "indicator" as the value containing the indicator we want to extract. Any other fieldname will be extracted and attached as additional attributes to the indicator. Step 2: Clone the prototype as a working node (miner) in the MineMeld engine Clicking on OK will store this brand new prototype into the library and the browser will be sent to it. Just change the search field to reveal our csv prototype and then click oni. Now itis time to clone this prototype into a working node into the MineMeld engine. So just click on the CLONE button, give the new miner node a name and commit the new configuration. itpssitve paloaltonotworks.comftSAkbiatleprntnagek-iAMineMeldAticlesiarice-8/290 si? 2irain0n LIVEcommunity- MineMeld to Extract Indleators From generic API- LIVEcommunty - 218757 Step 3: Verify the node status. ‘Once the engine restarts you should see a new node in your MineMeld en indicators in it. Click on it, then click on its LOG button and, finally, click on any log entry to reveal the indicator details. itpssitve paloaltonotworks.comftSAkbiatleprntnagek-iAMineMeldAticlesiarice-8/290 2rearo024 LIVEcommunity- MineMel to Extract Indicators From generic API LIVEcommunity - 218757 ‘As shown in the last picture, the extracted indicators are of URL type and additional attributes like city, region, country and temperature are attached to it. Other optional configuration parameters supported by the CSVFT class are: * fieldname: in case it to be null, then the values extracted from the first parsed lines will be used as fieldnames (remember that one of the fields must be named “indicator") + delimiter, doublequote, escapechar, quotechar and skipinitialspace control the CSV parser behavior as described in the Python reference guide Mining a HTML API In this section you will be provided with steps needed to use the HTTPFT class to mine dynamic data exposed in the response toa HTTP request (typically text/plain or text/html). If you have not done so, please review the complete process described in the section "Mining a CSV APF to understand concepts like "creating a new prototype", “cloning a prototype as a working node’, etc. To build a new HTTPFT class we first need base prototype that already leverages this class. In this example we will use the prototype named dshield.blockas the base. itpssitve paloaltonotworks.comftSAkbiatleprntnagek-iAMineMeldAticlesiarice-8/290 mr 2irain0n LIVEcommunity- MineMel to Extract Indleators From generic API-LIVEcommunity - 218757 Let's take a deeper look to the HTML API response to figure out how to generate a valid prototype to accomplish our mission. Request -> GET /html HTTP/1.2 Host: test.mineneld.com Response Headers <- HTTP/2.0 208 OK content-type: text/html content-length: 1626 Response Body <- &S1TUS< ctroctd>Santa Clarac/code>
So, what do we have here? A HTML table whose rows are provided in individual file lines and with each value in its own table cell. itpssitve paloaltonotworks.comftSAkbiatleprntnagek-iAMineMeldAticlesiarice-8/290 arr 2rearo024 LIVEcommunity- MineMtel to Extract Indleators From generic API-LIVEcommunity - 218757 First of all we have to get rid of all lines not belonging to table rows, We can achieve this with the ignore_regex lass configuration parameter. ignore_regex: *(?!) Next, we need a regex pattern to extract and transform our values from each line. The HTTPFT class leverages Python's re module and accepts configuration parameters both for the indicator itself and any additional attribute. Any Regular Expression strategy will be valid. We will use the following one in this example: (([*<]+)<\/code><\/td>) (([*<]+)<\/coe It is a large expression with 10 capturing groups. The first capturing group (\1) extracts the first cell and the second capturing group (\2) the value inside that given cell. Group 3 extracts cell number 2 and group 4 the value inside that second cell. And soon. ‘As the indicator (the URL) is in the first cell, then the corresponding configuration to achieve our goal must be: indicator: regex: *(ccode clas: 'smal1">([%<]+)<\/code><\/td>) (( ‘transform: \2 For the remaining attributes we can leverage the same regular expression but with different transformations. itpssitve paloaltonotworks.comftSAkbiatleprntnagek-iAMineMeldAticlesiarice-8/290 on7 2irain0n LIVEcommunity- MineMel to Extract Indleators From gener API-LIVEcommunity - 218757 fields: country: regex: *(([*<]+)<\/code><\/td>) (([*<]+)<\/code><\/td>) (([*<]+)<\/code><\/td>)( ([*<]+)<\/code><\/td>) ( GET /Json HITP/1.3 Host: test.mineneld.con Response Headers <- HTTP/2.8 200 OK content-type: application/json content-length: 861 Response Body <- 4 ‘description": "Real-Time tenperature of MineMeld-related cities in the world.", “result”: [ « "url": "https: //ajuntanent.barcelona.cat/turisme/en/", "country": "ES", “region”: “Catalunya”, "city": "Barcelona", “temperature”: 12.24 b « http: //wm-turisno. comune.parma.it/en", "country": "IT", “region”: “Emilia-Romagna”, "city": "Parma", “temperature”: 16.03 b « "url": "http://santaclaraca.gov/visitors", “country”: “US*, “region”: “California’, “eity": "Santa Clara” “temperature”: 8.98 y 1 } The JSON document looks quite easy and with a element (result) that already provides us the needed array of objects. So, our JMESPath extractor will be: itpssitve paloaltonotworks.comftSAkbiatleprntnagek-iAMineMeldAticlesiarice-8/290 sanr 2rearo024 LIVEcommunity- MineMel to Extract Indleators From gener API-LIVEcommunity - 218757 extractor: result You can check the expression in the JMESPath site to verify this expression will return the following array of objects ‘Spain “catalunya”, ‘ant. Cugat Del Valles”, “temperature: "22" “url”: “https://weather.yahoo.com/country/state/city-719975/", "country": "Italy", region": * nilia-Romagna”, ity": "Parma", -enperature’ wane United state: ‘santa Clara", “temperature”: "2" At this point we just need to identify the object attributes that contains 1) the indicator itself and 2) any additional attribute we want to attach to the indicator. In our case, the configuration for it will be: indicator: url Fields: + country ~ region ~ city + temperature itpssitve paloaltonotworks.comftSAkbiatleprntnagek-iAMineMeldAticlesiarice-8/290 san? 2irain0n LIVEcommunity- MineMel to Extract Indleators From gener API-LIVEcommunity - 218757 Itis time to put all these configuration statements into a SimpleJSON class prototype. We can use, for example, the aws AMAZONSstandard library prototype as the base. Did you noticed the "json" prefix in all extracted additional attributes? You can control that and a few other behaviors of the class with the following optional class configuration elements: + prefix: that will be attached to any additional attribute attached to the indicator. + verify_cert: It controls if the SSL cetificate should be verified (default - true) or not (false), + headers: + client_cert_required, cert file and key_file can be used to force a HTTP request ist that allows you add additional headers to the HTTP request. with client certificate authentication. Bonus Track: using indicator attributes itpssitve paloaltonotworks.comftSAkbiatleprntnagek-iAMineMeldAticlesiarice-8/290 san 2irain0n LIVEcommunity- MineMtel to Extract Indleators From generic API-LIVEcommunity - 218757 Wondering why would anyone extract additional attributes from the feed and not just the indicator value? Lets's imagine we want to provide two feeds with Yahoo Weather urls of cities: + One of them will be called "time-to-beach’ and will list URL's of cities where population is over 30°C and, probably, preparing themselves to go to the beach or outdoor swim pools. + The other one called "no-beach-time-yet’ will list cities with current temperature bellow 30°C We can achieve that with the input and output filtering capabilities of the MineMeld ‘engine nodes. Let me share with you a couple of screenshots of the prototypes that will do this job: Clone each one of these two prototypes as working output nodes and connect their inputs to the JSON miner you created. That should build a graph like the one shown in the picture, itpssitve paloaltonotworks.comftSAkbiatleprntnagek-iAMineMeldAticlesiarice-8/290 ssi? 2rearo024 LIVEcommuniy- MineMel to Extract Indicators From generic API-LIVEcommunity - 218757 : m5 OS = At the time of writing this article, only one of the four cities in the feed is over 30°C. GET /Feeds/no-beach-time-yet HTTP/1.1 http://w turismo. comune.parma.it/en https: //ajuntanent.barcelona.cat/turisme/en/ GET /feeds/tine-to-beach HTTP/1.1 http: //santaclaraca.gov/visitors were 16,992 Views Comments Hey! Just a quick question that maybe I didn't quite understand. If the HTML external list I'm going itpssitve paloaltonotworks.comftSAkbiatleprntnagek-iAMineMeldAticlesiarice-8/290 sen? 2rearo024 LivEcommunity~ MineMeld to Extract Indicators From generic API - LIVEcommunity - 218757 touse for my prototype is protected by a simple user/password combination how do I tell MineMeld to authenticate before extracting the info? 15,224 Views & Michael_D 01-04-2019 07:32 https://test minemeld.com/json-> (message": "Internal server error") [don't want to open a support ticket but if anyone is monitoring this thread and they could investigate the test.minemeld.com instance 14,466 Views £ Xhoms 05-03-2019 05:02 Hi @Michael D, thanks for letting us know the example was not working anymore. Yahoo discontinued his weather API. Just the example to another provider. 12,521 Views itpssitve.paloaltonotworks.com/tsfkbiatleprntnagetk>-iAMineMeldAticlesiaricle-8/290 amr

You might also like