Skip to content Skip to sidebar Skip to footer

How To Delete The \n\t\t\t In The Result From Website Data Collection?

i want to retrieve the names of product from the website, so i write my code below. but the result includes some trivial info such as \n\t\t\t. Can someone help me how to delete th

Solution 1:

I worry a bit about removing all tabs but this would do it:

> reviews <-"VZ-C6 / VZ-C3D\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\tDocument Camera\n\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t"> reviews <- gsub( "\\\t", "", reviews)
> reviews
[1] "VZ-C6 / VZ-C3D\n\nDocument Camera\n\n"

Read ?regex and understand that there are extra backslashes needed because both R and regex use "\" as escapes and so there are two levels of character parsing on the way to a pattern. That's not the case in the replacement argument though so you don't need to used doubled escapes there. So if you then wanted to replace those "\n\n"'s with just one "\n" you could use:

> reviews <- gsub( "\\\n\\\n", "\n", reviews)
> reviews
[1] "VZ-C6 / VZ-C3D\nDocument Camera\n"

Solution 2:

The go-to function for "find and replace" operations on strings in R are sub (to replace just the first instance) and gsub (to replace all instances). These functions seek a pattern in the string represented by a regular expression, and replace it by a fixed string of text.

For example:

s <-"VZ-C6 / VZ-C3D\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\tDocument Camera\n\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t"

gsub('\t|\n', '', s)

[1] "VZ-C6 / VZ-C3DDocument Camera"

The pipe operator (|) in the the pattern above, \t|\n, ensures that either \nor\t are matched, and the second argument of '' says to replace matches with an empty string (i.e. nothing).

While s above contains just a single element, gsub and sub are vectorised and so will also work on an entire vector of arbitrary length.

Post a Comment for "How To Delete The \n\t\t\t In The Result From Website Data Collection?"