Skip to content Skip to sidebar Skip to footer

How Do I Ignore Tags While Getting The .string Of A Beautiful Soup Element?

I'm working with HTML elements that have child tags, which I want to 'ignore' or remove, so that the text is still there. Just now, if I try to .string any element with tags, all I

Solution 1:

for child in soup.find(id='main'):
    ifisinstance(child, bs4.Tag):
        print child.text

And, you'll get:

This is a paragraph.
This is a paragraph with a tag.
This is another paragraph.

Solution 2:

Use the .strings iterable instead. Use ''.join() to pull in all strings and join them together:

print''.join(main.strings)

Iterating over .strings yields each and every contained string, directly or in child tags.

Demo:

>>> print''.join(main.strings)

This is a paragraph. 
This is a paragraph with a tag. 
This is another paragraph. 

Post a Comment for "How Do I Ignore Tags While Getting The .string Of A Beautiful Soup Element?"