分享
三行代码  ›  专栏  ›  技术社区  ›  Ana Goessens

列出数据帧列中每行的拼接 - List splice for each row in column of dataframe

  •  0
  • Ana Goessens  · 技术社区  · 1 周前

    我有一个包含字符串的列。我想转换这个列,所以我只得到字符串的前n个单词。

    我知道我需要分裂字符串,然后拼接列表以保留前n个单词。然后我可以用join来,好吧,再加入他们。但是我在执行这个的时候遇到了麻烦。

    我希望以下几点能起作用:

    data = [[1, "A complete sentence must have, at minimum, three things: a subject, verb, and an object. The subject is typically a noun or a pronoun."], [2, "And, if there's a subject, there's bound to be a verb because all verbs need a "], [3, "subject. Finally, the object of a sentence is the thing that's being acted upon by the subject."], [4, "So, you might say, Claire walks her dog. In this complete "]] 
    df = pd.DataFrame(data, columns = ['id', 'text']) 
    
    df['first_three'] = df['text'].str.split()[:3]
    

    但这将对前3行执行split命令,而不是保留每行的前3个字。

    看起来是这样的:

    first_three
    ['A', 'complete', 'sentence', 'must', 'have,', 'at', 'minimum,', 'three', 'things:', 'a', 'subject,', 'verb,', 'and', 'an', 'object.', 'The', 'subject', 'is', 'typically', 'a', 'noun', 'or', 'a', 'pronoun.']
    ['And,', 'if', "there's", 'a', 'subject,', "there's", 'bound', 'to', 'be', 'a', 'verb', 'because', 'all', 'verbs', 'need', 'a']
    ['subject.', 'Finally,', 'the', 'object', 'of', 'a', 'sentence', 'is', 'the', 'thing', "that's", 'being', 'acted', 'upon', 'by', 'the', 'subject.']
    NaN
    

    我想让第三栏看起来像这样:

    first_three
    [A, complete, sentence]
    [And, if, there's]
    [subject, Finally, the]
    [So, you, might]
    

    所以我可以加入他们继续。 我知道这一定很容易解决,但我似乎找不到解决办法。 非常感谢您的意见。

    1 回复  |  直到 1 周前
        1
  •  0
  •   vbrises    1 周前

    您可以使用apply函数从列表中提取所需数量的元素。

    df['first_three'] = df['text'].str.split().apply(lambda x : x[:3])
    

    如果您还需要一些文本清理,则可以这样做:

    df['first_three'] = df['text'].str.replace(",", " ")
    df['first_three'] = df['first_three'].apply(lambda x : x.split()[:3])
    

    产量

    first_three
    [A, complete, sentence]
    [And, if, there's]
    [subject., Finally, the]