Skip to content

Clean Up CSV Files

The CSV output may contain HTML tags used for formatting. You can use these scripts to remove the HTML.

If you don't want HTML tags in your CSV data, use one of the following scripts to remove it.

Python

 from bs4 import BeautifulSoup
text="""'Open URL redirection exists when the application redirects the user's browser to a URL that was constructed using untrusted data. Open URL redirection allows an attacker to redirect a victim to a site under the attacker's control. Since this site is fully under the attacker's control, the attacker may use it to perform any number of malicious actions.<br>
In the simplest example, consider an application page which takes a URL from a query string parameter supplied by the user:<br>
<code>http: / /www.site.com /Redirect.aspx?next_page=http %3a %2f %2fwww.site.com %2fHome.aspx< /code><br>
When a user navigates to this page, the value of the parameter ''next_page'' is sent to the server. The application takes the value of this parameter, and issues an HTTP 302 response back to the user's browser which redirects them to the requested site:<br>
<code>HTTP /1.1 302 Found< /code><br><code>...< /code><br><code>Location: http: / /www.site.com /Home.aspx< /code><br>
The application will issue the redirect to any URL since there are no controls in place to limit where the application sends the user.<br>The attacker may abuse the open redirect to trick users into visiting a malicious site that executes scripts attempting to install malware on the victim's system. The attacker may also display a page to the victim which is designed to look the same as the legitimate application, prompting the user to re -enter their credentials or other sensitive information, which are then sent back to the attacker.'"""

def removeAllTags(htmlText):
    soup = BeautifulSoup(htmlText.strip("'"), features='lxml')
    output_text=soup.text
    print(output_text)
    return output_text

if __name__ == '__main__':
    removeAllTags(text)

Java

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;

public class RemoveAllTags {

    public static String removeAllTags(String htmlText){
        Document d = Jsoup.parse(htmlText.replaceAll("^\'", "").replaceAll("\'$", ""));
        String output_text= d.text();
        System.out.println(output_text);
        return output_text;
    }

    public static void main(String[] args) {
        final String htmlText="<ol><li>Navigate to any one of the end points on Insomnia /Postman App.</li><li>Authenticate to the application using any of the users.</li><li>Send the request.</li><li>Observe in the response that the Content Security Header is missing.</li>";
        removeAllTags(htmlText);

    }

}

Parent topic:Managed Services Platform API Reference